Crime is public record. But it has taken our team over two years to acquire this data, convert it to useful and universal formats, identify clerical errors and duplicates, and classify the offenses into a broad and specific categories. We are now opening this up for everyone to have the opportunity to detect and understand the patterns of crime.
Our long term goal is to steer social policy in an evidence-based manner. Legal policy is often driven by intuition and politics more than by data analysis. Large-scale data analysis has the potential to reveal patterns that will assess the efficacy of legislation. Using millions of criminal records from multiple states, we mine patterns of crime and recidivism to help navigate a more effective criminal justice policy. Which policies over the past few decades have effectively reduced crime? Which types of crime respond to which types of policies? Are there “gateway crimes” that lead offenders to commit other crimes in the future? What patterns correlate with re-offense? Which crime types cluster, and which are rarely performed by the same individual? When does sentencing effectively prevent offenders from reoffending?
Funding for this tool was provided by the National Science Foundation SBE Office of Multidisciplinary Activities under Grant No. 1439453.
Miami-Dade County, FL, is the 7th most populous county in the United States and is the county seat of Miami, FL. It consists of 5.7 million records spanning from 1971 to 2012. The data contains 30 variables and was obtained from Miami-Dade County Clerks Criminal Justice Information System on December 3, 2013.
New York City, NY, is the most populous city in the United States. It consists of 9.8 million records spanning from 1977 to 2013. The data contains 20 variables and was obtained from New York State Division of Criminal Justice Services in 2013. It currently only contains the most serious charge in a given arrest and does not yet contain identifiers.
Harris County, TX, is the 3rd most populous county in the United States and is the county seat of Houston, TX. It consists of 3.1 million records, spanning from 1977 to April, 2012. The data contains 61 variables and was obtained from the Harris County District Clerk's Office in September, 2013.