Data doesn’t lie. But sometimes, you have to ask it better questions to get anywhere meaningful.
In making criminal justice decisions, courts and prosecutors have tried to use data and algorithms to determine things like the risks to society a particular accused may pose. The concept is simple: look at the data and determine who is likely to commit another crime or flee before disposition. From this analysis, the theory goes, you can then determine if a particular accused is similar to the individuals the data says are likely to pose those risks.

The problem, of course, is that this usually means black Americans and people of color are deemed to be of higher risk. According to the U.S. Sentencing Commission, black Americans typically receive sentences some 19.1 times longer than whites, for example. So the use of data in the criminal justice system is fundamentally flawed. Right?
Well, not necessarily: the data isn’t flawed, but how the criminal justice system uses it may be. The ACLU, Carnegie Mellon, and Penn recently undertook a Study to see what would happen if we asked a different question.
Instead of asking what risk a person poses to society and the justice system, ask what risk the justice system poses to the individual accused. Can we determine if a given accused falls into a group that typically receives a disproportionate sentence, or is denied a pre-trial release, or even is wrongfully convicted?
We can determine who is at risk of a long sentence based on factors that should be irrelevant, like race.
The answer, says the Study, is yes. (According to the ACLU Commentary on the Study, it has been peer-reviewed). The data can easily predict the risks of the criminal justice system to accuseds. And we can determine those people who are disproportionately exposed to them. We can determine who is at risk of a long sentence based on factors that should be irrelevant, like race.
Ok, but so what? What can we do with such assessments? The Study identifies three things:
First, such assessments can be used as tools by public defenders and accuseds to strategically assess proposed sentences and other questions. The data could also provide them with evidence to back up arguments that a particular decision is wrong or biased.
Next, the analytics could be used to bolster claims under the First Step Act. Under this Act, incarcerated persons can seek sentence reductions where there are “extraordinary and compelling” circumstances. The data would enable claimants under the Act to show how far their sentence deviates from the norm.
Finally, the analytics could be used to convince the executive branch to make better clemency decisions by demonstrating that a particular person has received an excessively long sentence.
All of this is well and good. But changing the minds of prosecutors, judges, and others in the short run or in a given case with data they often don’t want to hear may not go very far.
The problem is the problem
But the most significant impact of data like this is the ability to use the analysis as a tool to change policies and attitudes in the long run. To shine a light non the terrible and unfair impact the criminal justice system has on people of color, among others. To balance the traditional use of the data to ferret out those who pose risks–a use that results in a biased and flawed result.
Data itself is neutral. How we use it and the questions we ask are crucial. The Carnegie Mellon-Penn Study shows just that.
Someone once told me the problem is the problem. Perhaps that’s a good guiding principle for data analytics.