Artificial Intelligence, Technology & Society

Can AI help judges make the bail system fairer and safer?

An analysis by the Stanford Computational Policy Lab will give judges new tools to set bail in ways that better balance the rights of defendants with the need for public safety.

At any given time an estimated 500,000 Americans are in jail awaiting trial because a judge deemed them a flight risk or a danger to the public. But many of those pretrial detentions are unnecessary and unfair, says Sharad Goel, assistant professor of management science and engineering and executive director of the Stanford Computational Policy Lab (SCPL). He and his colleagues studied 100,000 judicial decisions and found that some judges released more than 90 percent of defendants on bail, while others released only 50 percent. Goel says such disparities flow from the often haphazard way in which these consequential decisions are made.

The SCPL has received a $2.3 million grant from the Laura and John Arnold Foundation to make pretrial risk assessments fairer, more dependable and more widely used. Under the five-year award, SCPL will collaborate with the North Carolina-based Research Triangle Institute to use machine learning techniques — self-teaching computer systems — to develop a new generation of risk assessment tools that reduce pretrial detainment while preserving public safety. We talked with Goel to learn more.

What sort of pretrial risk assessment tools do judges use today?

One of the most common pretrial risk assessments is the Arnold Foundation’s Public Safety Assessment (PSA). It scores a defendant’s possible risk by looking at nine factors. If a defendant has a prior violent conviction, for example, they get a point. Once points for all nine factors have been calculated, that total become the defendant’s public safety assessment, or PSA score. It can be used to estimate a defendant’s likelihood of failing to appear for their court date or their risk of committing a violent crime.

What’s the problem with today’s tools?

Popular risk assessments — including the PSA — take a one-size-fits-all approach and are typically not tailored to the needs of specific jurisdictions. That strategy has the benefit of simplicity and uniformity but cannot account for local idiosyncrasies. Different cities often have access to different data, measure risk factors in different ways, and have different preferences about which types of crime to prioritize. A national risk assessment tool cannot address these local differences.

How do you plan to improve existing tools?

We plan to design risk assessments that are tailored to individual jurisdictions. Using modern methods of data science, we will automatically examine dozens of variables in each jurisdiction, involving thousands of pretrial detention decisions, to figure out which factors are most predictive of whether a defendant is a flight risk or a danger to the public. Then we’ll boil all this down into simple rules similar to today’s PSA. By combining the sophistication of machine learning — where computers discover and synthesize patterns in data — with the transparency of simple rubrics, we plan to create a next-generation risk assessment tool that is both more effective at protecting public safety and more equitable, encouraging wider adoption.

How can you be certain the computational system isn’t biased?

Good question. This is an issue that we’ve thought about a lot. Machine learning systems are typically black boxes, which means we can’t always understand how they work. That could be problematic in a criminal justice setting. For instance, a machine learning algorithm that considers where a person lives might use that information as a proxy for race in ways that reinforce bias. There are ways to help ensure that risk assessments don’t fall into this trap. One way is to make sure that predicted risk scores are the same regardless of race. For example, when our system labels black defendants as high risk, we’ll compare them with white defendants also labeled as high risk. If the tool is working as intended, the historical data should show that both groups commit new crimes at similar rates.

Transparency is another guard against unintentional bias. Our risk assessments will clearly explain how predictions are made. Any stakeholder interested in understanding how we arrive at risk scores can examine our work to see if it agrees with other research about what factors predict flight or repeat criminal activity.

So you’re using artificial intelligence to give judges better guidelines?

Exactly. We want judges to remain independent. It’s important that the final decision remains with them. But a statistically robust rubric can help judges identify and release people who really are low risk. We think that by following recommendations from our risk rubrics, judges could, in some cases, detain half as many accused individuals without endangering the public or increasing the number of defendants who fail to appear at trial.

How else will these new guidelines try to help judges?

We plan for our rubrics to incorporate alternatives to incarceration that may help reduce a defendant’s risk. For example, say a certain type of defendant would normally be rated as high risk but our research indicates that behavioral health assistance, like a drug treatment program, could lower their risk factors. A judge might choose to provide this assistance instead of incarcerating the defendant. We hope that including such information will help judges make more informed decisions and encourage them to choose alternatives to incarceration when it makes sense.