Overview

For this weeks class, the expected readings are:

It is a bit of a hodge-podge, but these are some regular models used in criminology I wanted you to have at least some exposure to.

Geographically Weighted Regression

GWR for short from here on out, is a way for making regression coefficients vary based on the spatial location. For a simplified example, imagine you have two neighborhoods in a city, A and B. You then fit a regression model predicting the effect of the number of crimes on a street as a function of the number of bars, a seperate equation for each neighborhood and get the following results:

\[Y_A = 2 \cdot (\text{Bars}_A)\] \[Y_B = 1 \cdot (\text{Bars}_B)\]

So here, adding a bar to a street in neighborhood A results in an increase of 2 crimes, but adding a bar in neighborhood B only results in an increase of 1 crime. While there is not any specific theory why this might occur, you can make a reasonable argument about combining different crime theories and why this might occur. For example, in crime pattern theory, bars are crime generators and crime attractors. In social disorganization theory, places that have more poverty and in which people move more often have higher crime. You may think that bars in places with more social disorganization are likely to have larger effects on crimes than bars in nicer neighborhoods (Smith, Frazee, and Davison 2000).

GWR just makes it so you do not have to check coefficients in discrete neighborhood areas – you can estimate continually varying coefficients over the whole study area. The idea is that you estimate a weighted regression model over many points in the city. The weights are determined by how close the units are to the focal point. So if I had a simple grid:

X X X
X X X
X X X

I would estimate 9 different regression models. The coefficients in the top left corner would be close to the coefficient in the top middle, because it weights observations similarly. This then makes a smooth map of changing coefficients over the study area, instead of discrete jumps by splitting up the study area as I did in my original example. Here is an example from Graif and Sampson (2009) showing this:

They map the t-coefficients instead of the effects, but you can see that more foreign born have positive effects on crime in the central part of the city and toward the periphery, but have large negative effects through strands in teh central part of Chicago.

GWR is useful for exploratory data analysis, but a problem happens. If you estimate 100 different regression coefficients, some are bound to be statistically significant just by chance. Also the statistics are not independent, the coefficient for the model for the upperleft is not independent for the model in the upper middle. This non-independence is shown by the fact that maps are smooth - GWR cannot estimate large jumps in the coefficients. Finally, the weighted models have more noise than a single model for the whole city. If you only have 100 observations, then each of the weighted models is influenced by even fewer data points.1

I have you read Graif and Sampson (2009) not because I believe the results, but because I think they are mostly chasing noise in that example. I do not believe there is much theoretical reason to expect that immigration rates have different effects on crime in different parts of the city, and that is what they found when using GWR. But basically every example of GWR I have seen finds spatially varying coefficients. I think most of that is chasing the noise though.

For those interested in learning more about GWR, I would suggest reading the book by its originators, Fotheringham, Brunsdon, and Charlton (2002). Also for those looking for advice about making GWR maps, see Mennis (2006).

Discrete Choice Models

It took me along time to figure out what the hell was going on with discrete choice models. Here is how I think about them. Imagine you had a city with four areas; a,b,c,d, and you had two offenders, X1 and X2. The database set up for discrete choice models would be:

Comitted Crime? Area Offender
1 a X1
0 b X1
0 c X1
0 d X1
0 a X2
1 b X2
0 c X2
0 d X2

Here offender X1 committed his crime in area a, and offender X2 committed his crime in area b. To estimate the discrete choice model, you need to have all of the areas for each offender in the database as well, and try to predict here among the four different areas. Discrete choice models are subsequently just a special type of logistic regression model (it is also called the conditional logistic model).

These models are useful for theory testing. For instance, you may have a variable that lists the distance between the home address of the offender and the spatial area (Bernasco and Block 2009), or you may include a variable predicting whether an offender used to live in neighborhood a (Bernasco 2010a). (As you can see, Wim Bernasco has a series of articles on the technique and theoretical applications.) The technique could also be used to predict the next crime location, but I haven’t seen any application of trying to do that with new data.

One particular limitation of this technique is that you need to predict discrete areas - it is not a general surface like geographic offender profiling. Because of this most applications are predicting committing crime in larger neighborhood areas and for only a few offenders (but see Bernasco (2010b) for one exception of predicting micro places, but still with only around 1,000 offenders). If you have 100 areas and 1,000 offenders, the final database you need to estimate the model on will have 100,000 observations. So it is hard to use this application if you want to predict a very specific place an individual will commit a crime at.

Homework and for next week

You do not have a tutorial for this week, you should spend your time working on your final project. If you are reading this earlier in the semester and are interested in pursuing GWR or discrete choice modelling for your final project, just let me know and I can help out if you need more help on how to actually estimate these models.

References and Endnotes

Bernasco, Wim. 2010a. “A Sentimental Journey to Crime: Effects of Residential History on Crime Location Choice.” Criminology 48 (2): 389–416.

———. 2010b. “Modeling Micro-Level Crime Location Choice: Application of the Discrete Choice Framework to Crime at Places.” Journal of Quantitative Criminology 26 (1): 113–38.

Bernasco, Wim, and Richard L. Block. 2009. “Where Offenders Choose to Attack: A Discrete Choice Model of Robberies in Chicago.” Criminology 47 (1): 93–130.

Fotheringham, A. Stewart, Chris Brunsdon, and Martin Charlton. 2002. Geographically Weighted Regression: The Analysis of Spatially Varying Relationships. West Sussex, England: John Wiley & Sons, LTD.

Graif, Corina, and Robert J. Sampson. 2009. “Spatial Heterogeity in the Effects of Immigration and Diversity on Neighborhood Homicide Rates.” Homicide Studies 13 (3): 242–60.

Mennis, Jeremy. 2006. “Mapping the Results of Geographically Weighted Regression.” The Cartographic Journal 43 (2): 171–79.

Smith, William R., Sharon G. Frazee, and Elizabeth L. Davison. 2000. “Furthering the Integration of Routine Activity and Social Disorganization Theories: Small Units of Analysis and the Study of Street Robbery as a Diffusion Process.” Criminology 38 (2): 489–524.


  1. Ned Levine, the creator of CrimeStat, also speaks very negatively of GWR models. (I presume for this reason he does not include them in CrimeStat.) In place of them, I know he suggests using CAR models (which we briefly touched on in the spatial regression class), and subsequently has the ability to estimate CAR models in CrimeStat.