Study
Ariel, Sherman, and Newton (2020) conducted a randomized controlled trial of hot-spots police patrols on the previously never-patrolled, track-level platforms of the London Underground (the rapid transit system of London). Each station in the London Underground railway has multiple platforms. To determine platforms eligible for inclusion in the study, all London Underground platforms were rank-ordered according to the level of crime they experienced in 12 months. The study excluded platforms that experienced fewer than two crimes per year, stations that were targeted and routinely patrolled by special “Hub Teams,” and platforms that were located too far away (45 minutes or more) from other stations. Hot spots were defined and located using a list of victim-generated “hard crimes” such as violence, antisocial behavior against others, and criminal damage (see Weisburd and Green 1995). Finally, a list of 115 eligible hot-spots platforms with a mean of greater than 4.72 crimes per year was created using targeting analysis. Random assignment was used to allocate the 115 eligible platforms to the treatment or control condition.
The experiment was in operation for 6 months, from mid-September 2011 through mid-March 2012, with comparisons to the same months of the previous year (2010–11). Fifty-seven of the London Underground’s 115 highest-crime platforms were randomly assigned to receive foot patrol by officers. The treatment platforms received foot patrol in 15-minute doses, four times a day, during 8-hour shifts on 4 days a week for 6 months. By the end of the experiment, there were 23,272 police arrivals at the treatment hot spots over 26 weeks. Fifty-eight of the London Underground’s highest-crime platforms were randomly assigned to the control condition. The control platforms did not receive patrols in hot spots and lacked any prior treatment at baseline.
During the baseline period, the sample stations had 6,052 calls for service and 4,471 crimes. A total of 2,834 calls for service were identified for the treatment hot spots, and 3,218 calls for service were identified for the control hot spots during the baseline period. A total of 2,079 crimes were identified for the treatment hot spots, and 2,392 crimes were identified for the control hot spots during the baseline period. No statistically significant pretreatment differences were found between treatment and control conditions for the eight comparisons examined in this study.
The outcome measures of interest were all calls for service to the police (“999 calls”) and all reported crimes within the participating hot spots during the 6 months of the experiment (2011–12) and for the 6 months in the (baseline) year before the experiment (2010–11). Data were then broken into eight outcome variations, reflecting the “types” of deterrence to be tested in the experiment: 1) crime on platforms only, on patrol days, during patrol hours; 2) crime on platforms only, on patrol days, during nonpatrol hours; 3) crime on platforms only, on nonpatrol days, during patrol hours; 4) crime on platforms only, on nonpatrol days, during nonpatrol hours; 5) calls for service on platforms only, on patrol days, during patrol hours; 6) calls for service on platforms only, on patrol days, during nonpatrol hours; 7) calls for service on platforms only, on nonpatrol days, during patrol hours; 8) calls for service on platforms only, on nonpatrol days, during nonpatrol hours. London Underground’s detachment of the British Transport Police provided crime and calls for service data.
The researchers used a pretest–posttest, control-group design to assess the impact of the treatment on the outcomes of interest. Specifically, generalized linear models were used to assess the differences between experimental and control hot spots in regard to citizen-generated calls-for-service and citizen-reported crime incident counts. Since the dependent variables were count variables, Poisson regression modeling was employed to estimate the multivariate models. The authors used Bayesian Information Criteria values to compare models. Based on the results of the models, the authors computed estimated marginal means, which provide the mean responses for each factor, and adjusted for the baseline scores. The means were then converted into standardized mean differences (Cohen’s d values), and the outcomes were then presented in forest plots. The authors did not conduct subgroup analyses.