This week I will illustrate three different types of mapping techniques you can do in Excel: repeat address, choropleth, and kernel density hot spot maps. Excel has only very basic mapping capabilities, and most crime analysts will use a full fledged Geographic Information System (GIS) to do more sophisticated data manipulation. (Check out my graduate level GIS course for example.)
But Excel has recently made some mapping capabilities available, so I will show how to do some simple mapping techniques. Either download the data from ELearning, or download it from here.
For those not familiar with zip files, I often use them to send larger files as one (nicer for emailing or opening up a set of files). It compresses the data, and makes it one file, so easier to share. So here is what that zip file looks like on my machine in the Windows folder view.
But to work with them you need to unzip them. Right click on the zipped folder and select Extract All.
Then it brings up a dialogue for the location of where to save the files. By default it saves it in the same folder that the zip file is located in. If you are working in the lab, you might want to change it to where you store all of your files for class. Then hit Extract to de-compress the files.
You can see I have a new folder named Lab04_Mapping.
To do the reverse, what is typical is to place all of the files in a single folder. Then you can right click on the folder, and then select Send To -> Compressed (zipped) folder
And that will subsequently create a zip file in the same folder (if you still have Lab04_Mapping.zip in the same folder, it will make a new one named Lab04_Mapping (2).zip
).
That is the zip file service that comes for free with Windows. Another popular compression tool is 7-zip. That tool can do either traditiona zip compression, or even better compression .7z. (But note not everyone will have access to 7z, to some users may not be able to use it.)
In this section, I will illustrate some simple text manipulation you can do in Excel. This is common to clean addresses, in many systems you will have extra text in the address column that prevents proper geocoding.
Go ahead and open up the DallasIncidents.csv file in Excel, and turn the data into a table. It is pretty big (just short of 100,000 total incidents), so takes a minute.
Select column G (City), and right click Insert.
This will create a new blank column in-between IncidentAddress and City. Name that column CleanedAddress. Then select the IncidentAddress column (F), double click the right edge of the F column at the top to expand it, and hit Ctrl+F to bring up the Find & Replace dialogue.
Type in APT in the Find What box, and then hit the Find Next button, this should take you to row 5,740, and show that an address with Harry Hines Blvd accidently has an apartment number included.
Having extra text like that will often make the geocoding engine fail. This dataset is pretty clean already (DPD must use some type of data validation already when inputting addresses), but it has a few of these APT notes misplaced in the main address field. To fix this what we are going to do is to search for the text APT, and then only return the text string. This will require nesting several functions together:
LEFT
function to return the stringFIND
function to search the address string for if it contains APT
IFERROR
function, to just return the original string if it does not contain any extra APT notes.So first, in our new (and so far empty) column and row 5740, type in the formula =LEFT([@IncidentAddress],10)
. You can see in the update this just grabs the first 10 characters in the string.
To strip out the APT text, and keep the address stuff we want, we need to replace the hard coded 10 with where the APT string starts. The function FIND
lets us do that, so lets replace our formula now with =LEFT([@IncidentAddress],FIND(" APT",[@IncidentAddress]))
. I include the space before APT to prevent catching street names that might include APT within them, e.g. something like RAPTOR ST would not be captured.
We can see now that this works for the Harry Hines address, but gives us errors for the others that do not have APT contained in them. This is because if " APT" is not contained in the original IncidentAddress string, the FIND function will return an error. So to fix this we will use the IFERROR function. Now update the formula to read =LEFT([@IncidentAddress],IFERROR(FIND(" APT",[@IncidentAddress]),100))
. In long form to make the nesting easier to read, here is how the formula goes:
=LEFT([@IncidentAddress],
IFERROR(
FIND(" APT",[@IncidentAddress]),
100
)
)
Here we use 100 as just a long number that will contain the full address for any cells that do not contain " APT“. Another option sticking just to the data would be to replace 100 with LEN([@IncidentAddress])
. And now we can see that our new address field works out just dandy.
Now right click on column K (ApartmentNumber) and Insert a new column. To use the Excel geocoder, we need to submit one string with the full address (include City, State, and Zipcode). Name the column FullAddress, and then type in the formula =CONCATENATE([@CleanedAddress],[@City])
Whoops, we can see that there is no space in-between our address field, and the city name Dallas. Now redo the formula, inserting ", "
inbetween the two columns, =CONCATENATE([@CleanedAddress],", ",[@City])
That looks better now. Go ahead and add in the State and Zipcode, incorporating extra spaces and comma’s in where appropriate. So in the end mine looks like =CONCATENATE([@CleanedAddress],", ",[@City],", ",[@State]," ",[@ZipCode])
.
Next up, doing repeat address mapping.
Go ahead and make a pivot table of the incidents, using FullAddress for rows, and ucr_redString for the columns, and Count of ucr_redString for the values.
Select the dropdown arrow in the Rowlabels area, and navigate to Value Filters -> Greater Than:
Then input 30 as the selection criteria and hit OK. This will limit the table to places with a total of 30 or more incidents (across all crime types).
Now right click on the header row of the pivot table, and select Pivot Table Options.
With the Layout and Format tab selected, in the empty cells option type 0. Then hit OK.
You will see the table now has 0’s instead of blanks for empty crime counts.
Before we move onto making a map (which is only available in newer versions of Excel, not sure if they are available for Mac editions), we need to save our worksheet as an xlsx file. So go to File -> Save As, navigate to where you want to save the file, and in the dropdown select Excel Workbook.
Also maps does not work with PivotTable (only regular tables). So we need to transfer our Pivot Table data into a new location. So, make a new worksheet, and label it RepeatAddress. Then from the Pivot Table copy-paste cells A4:K222. Then turn that data into a Table. I renamed A1 then to say address.
With that data, now in the top toolbar go to Insert, and then select the 3D Map option.
This should bring up a map! And Excel automatically figures out to geocode the locations. You can zoom into the locations around Dallas to check it out. I also click the Map Labels option (brings up street names), and the Flat Map option here.
Now we will style our map, showing addresses with more crimes as larger circles. (We will check the geocoding in a bit.) In the right hand side, click the bubble looking icon just below the Data layer, and then select the Add Field option below Size, and select Grand Total as the field.
This will make the bubbles too big, so expand the Layer Options (below Size), and set the Size to 20%, Thickness to 0, and Opacity to 80%. Here I also change the color to a more pink/salmon. Now you can see variation in crime areas a bit better.
You can zoom into the map and select the circles, which brings up a tool tip. Here I zoom into the largest bubble. Uh-oh, it is 1400 S Lamar St. This is just DPD headquarters! I don’t want to map that.
So navigate back to the RepeatAddress worksheet. Then sort the data by descending values on the Grand Total Column. Then select the Lamar St row, and right click and select Delete.
Now if we go back to our Map, the bubble is still there. But lets hit the Refresh Data button, and then see the map update with no more bubble on S Lamar St.
Now we are going to check our geocoding results. In the Layer table of contents on the right, above the Location box there is a little % button. Here it says 0%, and click that:
You can see a bunch of warning yellow triangles, but it just happens to be because of the proper case versus all uppercase addresses. So even though it says 0% mapped with confidence, it doesn’t appear to be that bad.
Click the Result column header twice, and this will sort the 3 cases that were not matched at all to the top of the table. (If you only click once, it will sort them to the bottom, all the way on page 5.)
We now need to change the data so it will map. I typically use a mix of googling (Fun Way appears to be a small road around Fair Park). And it happens that LBJ in the Excel (Bing) maps is listed under its full name, Lyndon B Johnson. Lets navigate back to the RepeatAddress sheet, and edit these few locations.
To make it easier I selected out just those three addresses in the Address column. Select the dropdown arrow on the Address column, the select the Text Filters -> Custom Filter option.
Then in the options to the right, make sure the contains option is selected, and in the two rows type FUN WAY
and L B J
(for the latter make sure there are spaces). Then click OK.
You should now see our three bad addresses selected.
For LBJ, we just need to edit the address to be Lyndon B Johnson and will it subsequently geocode. For Fun Way it is alittle more difficult, as that address is not in Bing maps at all. So edit LBJ to say Lyndon B Johnson
instead (keeping the same address). But for the 1600 FUN WAY I am going to just use Cotton Bowl Circle (as it is pretty close). Leave the 1550 Fun Way as is.
Now go back to the map page, click the Refresh Data button, and then scroll the map to Fair Park. You should now see our big bubble that (that used to be 1600 Fun Way).
If you go back to the Mapping Confidence dialogue and sort again by Results, you should now only see the 1550 Fun Way address.
I am not going to worry about fixing this last address for now. One way would be back in the original data, whenever you encounter FUN WAY, change that address to simply Cotton Bowl Circle. That would collapse the two FUN WAY addresses together in the final map.
Next up, we are going to create a choropleth map of crime densities per DPD reporting areas. Small geographic areas made up by DPD to patrolling essentially.
First we are going to add in a file of all of the reporting areas. Navigate back to the main Excel document, and create a new worksheet and title it, ReportingAreas. Then select cell A1, and in the top toolbar go to the Data tab, and select From Text.
Then in the dialogue, navigate to the RA_Data.csv file location, and select that file. and click Import.
In Step 1 of 3 in Text Import Wizard, make sure Delimited is checked, as well as My Data has headers. Then click Next.
Then in the next step, make sure Comma is selected. It will then make the fields have lines inbetween them (whereas if you leave it as Tab it will just import all of the data as one big string in one column).
Then in the step 3 of the Wizard you can keep the defaults for how it treats columns, and then just click Finish.
Then finally you get alittle Import Data dialogue that asks where you want to import the data. Make sure =$A$1 is chosen, and then click OK.
You may be asking what is the benefit of doing it this way, as opposed to opening up the CSV file and just dragging the sheet into the current workbook. The answer is when you do it this way, the data is linked in the workbook. If you edit the CSV file, your values in the Excel workbook will change.
Here it does not make much of a difference, later in the semester though I will show how when linking to an Access database this will be really helpful behavior.
Go to the original DallasIncidents_2017 sheet, and select the Design tab in the top toolbar. Then rename the table to Incident_Data.
Then navigate back to the ReportingAreas sheet and select cell R1. Then insert a Pivot Table – fill out the Select a table option to be Incident_Data, and the location to be R1 in the reporting areas worksheet. Then click OK.
In the Pivot Table, place ReportingArea for the rows, and count of ReportingArea for the Values.
This looks like it lines right up with our other data we imported, but if you scroll down further you can see it does not line up directly forever. In cell K2, type an equal sign, and then select cell S2. You will see a formula pop up that looks like =GETPIVOTDATA("ReportingArea",$R$1,"ReportingArea",1001)
Change the 1001 in the end to be B2. Then hit enter, and double click the formula to extend down the entire dataset. Here I also named the column CountCrimes. If you scroll down to row 44, you will see an error.
To fix this, we are going to nest our function within an IFERROR
function. So change the function reference to =IFERROR(GETPIVOTDATA("ReportingArea",$R$1,"ReportingArea",B44),0)
. Then make sure to update that function both above and below in the column. (Note, the mapping functions in Excel will not work if you have errors in the data files.)
Now to end we are going to make a field that is the density of crimes per square mile. In K1, write CrimeDens, and in K2, the formula is just K2/J2
.
For reference, if you are not working with Pivot Tables, another function to that of GETPIVOTDATA
that can accomplish a similar task is called VLOOKUP
. But here the Pivot Table specific function is a better choice.
Select cells A1:K1157, and then in the 3D map option select Add Selected Data to 3d Maps.
The map should now be the focus, and you should have a Layer 2 option in the right table of contents, but is empty (if not click the Add Layer button). Turn the point map off (by clicking the Eyeball in Layer 1). And then in the Location field in the dropdown select RA.
Then in the dropdown for the RA select Custom Region (.kml, .shp).
You then get a pop-up to import custom regions, Go ahead and click Yes, and then navigate to the DPD_ReportingAreas.kml file. (This is a type of geographic file, specifically one associated with Google Earth.)
In the Regions selection, change the default Name to RA. Then click Import.
You should see them populate as a bunch of points in the map. In the table of contents on the right, change the Viz. type to region. Then in the value section change it to CountCrimes.
Then in the layer options I changed a few options to get alittle more difference across the areas.
Note to create an image of the map, you can click on the Capture-Screen button on the top left, and then paste the resulting image into whatever document you are using.
This ends up not showing much variation and looks washed out. One way to make more variation in the map is to bin the data (such as via quantiles). (ToDo for another time!)
For your homework, create a table of the top 10 repeat addresses for Robberies in Dallas. Then for the mapping portion, create a Choropleth map of the robbery rates per population for reporting areas (see the POP2010 field). Insert both of the those into a powerpoint presentation, then save as a PDF, and turn the PDF in.
See below for an additional extra credit opportunity via making a kernel density map.
On my personal website I have created an example of creating an animated hot spot map over time. Check out the video for illustration.
For 5 points extra credit, create an animated hot spot map for one particular crime type in Dallas and turn it in.