Overview

This tutorial goes over three different types of analytical techniques. First, it shows how to use aoristic analysis of crime events with an unknown time of origin. Second, it shows how to make a cumulative probability chart, to show the typical time in-between crimes. Third, it shows how to conduct near-repeat analysis in space and time. You can download the data to follow along here or from ELearning.

Aoristic Analysis

For the majority of incident based crime databases, events are recorded with various information including location, time, offender characteristics, or crime types. However, crime event data are frequently recorded as occurring within some indefinite time frame. For example, you may park your car and go to work at 08:00 AM, and when you come back out at 04:30 PM on the same day to find your car window broken and your GPS stolen. Unless there happens to be other witnesses to the crime or surveillance footage, you don’t know when the criminal event occurred besides between those two times. Where this is problematic for crime analysis is, you want to be able to look at the distribution of when events occurred, so as to give suggestions to understand why the event is occurring and how to potentially address it. It makes a big difference to say “most car break ins happen during work hours” vs “most car break ins happen at nighttime when people are sleeping”. Aoristic analysis is simply a means to take into account that uncertainty of when the event occurred when we examine the overall incidence of crimes occurring across a set of times.

For a very brief illustrative example, lets say we want to know the number of crimes occurring for within the hours of 08:00, 09:00 and 10:00. If we had a criminal event that potentially occurred between 08:00 and 10:00, which is a total time span of 120 minutes (2 hours), instead of counting that event as occurring at 08:00 (the begin time), 10:00 (the end time) or 09:00 (the middle time), we spread the event out over the time frame, and only partially count it within any particular interval. So here it would count as a total of 0.50 weight in both the 08:00 and 09:00 category (60/120=0.5) and assign 0 weight in the 10:00 category (note the weights sum back to the value of 1). This just ends up being a way to estimate the incidence of some event within a given time bin knowing that the event did not necessarily occur in that time bin, so only partially counts towards the total in that bin (where partially is defined by how long the interval is and how much of that interval overlaps with the bin).

We will analyze residential burglary cases for the tutorial. (It only makes sense to conduct aoristic analysis on cases in which the times are uncertain.) Open up the “DallasIncidents_2017.csv” file. To sort the data and select the burglary cases, make a data table.

We select residential burglary with sort function in Excel. Navigate to “OffenseType (AL1)” cell and sort incidents by PART1.

Then, click OK. Now you have incidents that belong to the UCR PART1 category. If you navigate to “UCROffense(AI)”, we can sort residential burglary cases for your analysis.

Now you have data for residence burglary incidents in Dallas. To copy and use the data, only visible cells need to be copied. Click Home > Find & Select, and pick “Go To Special” and then click “Visible cells only”. Now you can copy visible cells and paste them into a new spreadsheet. You should now have total 5,656 residence burglary incidents and are ready to do aoristic analysis.

To do an aoristic analysis,first off, you need your data in Excel to be BeginDateTime and EndDateTime. If you do have them in separate fields, if they are formatting correctly you can simply add your date field to your hour field. If you have the times in three separate date, hour, and minute fields, you can do a formula like =DATE + HOUR/24 + MINUTE/(60*24) to create the combined datetime field in Excel (excel stores a single date as one integer). Fortunately, we have correct begin and end date/time information (find cell BD and BE).

So once your begin and end times are correctly set up, you can copy paste your dates into my Aoristic_HourWeekday.xlsx excel spreadsheet to do the aoristic calculations. So go ahead and open up the Aoristic_HourWeekday.xlsx file. Select the cells in range of BD2 to BE1524 and copy them. Now migrate over to the Aoristic_HourWeekday.xlsx spreadsheet, and paste the data into the cell B2.

Now go to the DataConstructed sheet. Basically we need to update the formulas to recognize the new rows of data we just copied in. So go ahead and select the A11 to MI11 row, and Grab the little green square in the lower right hand part of the selected cells, and then drag down the formulas to the MI5657 (Note there are a bunch of columns hidden from view).

With your own data, you simply want to do this for as many cases as you have. If you go past your total N it is ok, it just treats the extra rows like missing data. This example with 5,656 cases then takes about a minute to crunch all of the calculations.

If you navigate to the TimeIntervals sheet, this is where the intervals are actually referenced, but I also place several summary statistics you might want to check out. The Total N shows that I have 5,656 good rows of data (which is what I expected).

Now the good stuff, if you navigate to the NiceTables_Graphs sheet it does all the summaries that you might want. Considering it takes a while to do all the calculations (even for a tinier dataset of 5,656 cases), if you want to edit things I would suggest copying and pasting the data values from this sheet into another one, to avoid redoing needless calculations.

Save the data file with a new name “DallasIncidents_2017_Resi_Burg.xls”. Interpreting the graphs you can see that residence burglary in this dataset have a higher proportion of events during the daytime but for weekdays.

Cumulative Probability Chart

Do most retaliatory shootings happen within hours, days, or weeks of a prior shooting? For one example use when I was a crime analyst, I used cumulative probability chart to show the time in-between shootings. Many shootings are retaliatory so I was interested in saying if a shooting happened on Sunday, how long should be PD be on guard for after an initial shooting. This is a hard question to answer with histograms, but is easier to answer with cumulative probability plots. Here is that example chart for time-in-between shootings:

Although this chart is not regularly used, it is really easy to explain how to interpret. For example, at time equals 7 days (on the X axis), the probability that a shooting would have occurred is under 60%. In my opinion, it is easier to explain this chart than a histogram to a lay audience.

To produce the chart it is often not a canned option in software, but it takes very simple set of steps to produce the right ingredients, and then you can use a typical line chart. So those steps generically are:

For the analysis, we will use the Dallas shooting incidents again. Open up the “DallasIncidents_2017.csv” file. To sort the data and select the shooting cases, make a data table (we leanred how to make a table in excel).

We select shooting cases with sort function in excel. Navigate to “OffenseType (AL1)” cell and sort incidents by PART1.

Then, click OK. Now you have incidents that belong to the UCR PART1 category. If you navigate to “UCROffense(AI1)”, and then “MURDER” and “AGG ASSAULT - NFV” are chosen.

The last category you will use for incident selection is “WeaponUsed” (AD1). Navigate to WeaponUsed and select “FIREARM (TYPE NOT STATED)” “Handgun” “Other Gun” “Other/Unknown Gun” “Rifle” “Shotgun” and “Unknown Type Gun”.

Now you have data for shooting incidents in Dallas. (Technically these could also include incidents in which someone uses a gun to assault someone, e.g. pistol whipping, but should for the most part be shootings.) To copy and use the data, only visible cells need to be copied. Click Home > Find & Select, and pick “Go To Special” and then click “Visible cells only”. Now you can copy visible cells and paste them into a new spreadsheet. You have total of 1,523 shooting incidents and are ready to calculate cumulative probability for shooting incidents.

Now you save the excel file with name “DallasIncidents_2017_Shooting.xlsx”. For the cumulative probability, we need only begin and end data time columns. Here I am going to delete all the fields except for “StartingDateTime” and “EndingDateTime”

To calculate the probability of shooting incident happen, we need to find the mid-time of shooting incidents. We need two more columns and then name two next columns “Mid-Time” and “Probability”. You can get the mid-time of the event with the formula (startingDateTime+EndingDateTime)/2. Write =(A2+B2)/2 in cell C2. Applying the formula to all the cells to by double clicking the edge of the box.

Now you have mid-time for all shooting incidents. However, the midtime is presented in general text. In order to transform into date format, select columnC and go to the number section under the Home menu. You will find the “more number format” and click it.

Click the “Date” and choose date with time type (e.g., 3/4/12/ 13:30).

In order to calculate the cumulative probability, you need to rank the cases based on time order. Click column C and copy and paste into column C with number only option.

We use the filter function to rank the cases. Click Row 1 and navigate to “Data” and click “Filter”.

Now you have filter functions at the every cell next to the title. Get Mid-Time and click “sort oldest to newest”.

You can see the four shooting cases come from 2016 year. We analyze the data for 2017 year and then delete four cases. Now, you have 1,519 shooting cases.

To calculate cumulative probability, we need to know the difference between time of shooting incident. You will get the difference by dividing the sum of two time points by two. Write the formula “=(c3-c2)/2” in D3 and double click the edge of the green box.

We are ready to calculate the cumulative probability. Create another new spreadsheet and write down “TimeBetShootings”(A1), “Rank (B1)”, “CumProp” (C1), and “TotalSample” (D1). Navigation to the previous spreadsheet and copy values in D3-D1520 (ComPro) and paste into A2 in the new spreadsheet.

However, you have “#REF!” rather than values since we copy the formula rather than number. Do right click on the mouse and choose “value option” under the paste option.

Again, we rank the cases based on the time between shootings. Click row 1 and navigate to Data and click the filter.

Now rank cases by clicking “sort smallest to largest”. You have a lot of zero values in your table. We also need ranks for each case according to its time between shootings. Type the formula “=ROW()-1” in B2 and double click the edge of the green box.

As explained above, the information about the total sample is also necessary to calculate the cumulative probability. Type “=MAX(B:B)” in D2 and you get the total number of cases. For Dallas in 2017, we have 1518 shooting cases.

The cumulative probability is calculated by “rank/(total sample size)”. Type the formula =B2/$D$2 in C2 and double click the edge of the green box to apply the formula to other cells.

Now, we get the cumulative probability for shooting cases in Dallas. However, we do not see clear evidence or information with numbers. Let’s make a plot figure with “TimeBetShootings” and “CumPro”. Select row A and row C together and go to the inset menu. You will see various figure options, and click the “Scatter with smooth lines”.

Make the figure larger and make it look better by using what you learned from the class.

Voila, you have your cumulative probability chart. Here it shows very short time periods in between shootings (0.2 is on the day scale, so on the hour scale it would be , as we are doing the analysis for all of Dallas. If you subsetted the data though, say to a particular neighborhood it would show a longer time period in-between shootings (or if you have shootings that were actually connected to each other through intelligence that would be even better).

Near Repeat Space-Time Tool

The last example of the crime pattern analysis is the Near Repeat Space-Time Analysis. For example, research has shown that once a burglary occurs on a street, the homes on that street and on nearby streets are at a much higher risk of being burglarized over the next one to two weeks. This phenomenon, referred to as “near repeat” burglary patterns, can be quantified and usually involves an increased level of risk at nearby locations for a relatively short distance and a limited time period. We are going to use for analysis a separate tool to conduct this analysis, the near-repeat calculator (created by Jerry Ratcliffe).

For the analysis, we need information about location and time of the crime events. We will use Dallas shooting cases for the analysis. Open up “DallasIncidents_2017_Shooting” you created that contains location and time information of shooting cases. Here I am going to delete all the fields except for “XCoordinate” “YCoordinate” and “Midtime”(you have ideas about how to create a Mid-Time column). Name the columns with “X” (A1), Y (B1), and “Date” (C1). The X and Y columns are the spatial coordinates in feet for Dallas (a local coordinate system, instead of latitude and longitude). Here is what the spreadsheet looks like now.

You can now save the file with name “DallasIncidents_2017_Shooting_Near_Repeat”. You can now close out of your spreadsheet, the data is ready for the near-repeat calculator.

Now before you use the Near Repeat calculator you need to install it on your system. Unzip the “Near-Repeat-Calculator.zip” file, then in that folder navigate to Near-Repeat-Calculator -> Near Repeat Calculator, and then select “SetUpNearRepeats.exe” file. This will install the NearRepeat utility on your system. Once that is done, open up the Near-Repeat calculator. You will get a screen that looks like below:

Click the Open data file button, and then import your “DallasIncidents_2017_Shooting_Near_Repeat” file. For the pop up about data’s distance, then select Feet.

Now we need to fill in the parameters for the spatial and temporal bins you want to check. This depends on theoretically how long you expect to see near-repeat patterns (in both time and space) and the temporal divisions you care about. Ratcliffe and Rengert (2008) use spatial bands of 400 feet up to 4000 feet, and temporal bands of 14 days (up to 56 days). So we will do a similar distance of 600 meters, but I want to check for smaller temporal periods. Here I will use 3 days, up to 21 days. To make it quicker, set the significance level to p=0.05, and also set the settings box to Euclidean. (Manhattan would only make sense if the streets were in a perfect grid.) Your near-repeat calculator should then look like below.

Now hit start analysis, and it only takes a few seconds to complete the simulations. You will then have a html file automatically open up, and it has a nice table of the results. (It saves this html file in the same folder that the “DallasIncidents_2017_Shooting_Near_Repeat.csv” file is located in.)

Here the same location within 0-3 days has a very high level of near-repeats. There are a few other patterns, but they are not temporally or spatially consistent.

Homework

For your homework, filter out commercial burglaries (use the UCROffense field). Then conduct aoristic analysis on the time of day that those events occur. Then calculate the middle time of those events, and conduct a near-repeat analysis. In the homework file, choose 1 aoristic graph, and the near-repeat table, insert into a PPT, and turn in those results (so should only be two pages).

For 5 points extra credit, insert those two products into a word file, and do a brief write up interpreting the findings (1-2 pages of writing, plus the graphs/tables).