Skip to main content Skip to secondary navigation
Main content start

AI could help radiologists interpret mammograms more accurately

Breast cancer experts try to detect the earliest signs of a tumor growing, while minimizing false alarms. A new computer model could help them walk that line.

Researchers are using machine learning systems to help ensure that women get early, accurate breast cancer diagnoses. | iStock/Dániel Balogh

Researchers are using machine learning systems to help ensure that women get early, accurate breast cancer diagnoses. | iStock/Dániel Balogh

It is a question many women ask: How often and how early should they get mammograms?

While breast cancer screening can save lives by spotting and treating malignant tumors before they get out of control, roughly half of women who get annual mammograms over a 10-year period will get at least one false-positive result, according to the American Cancer Society. Those false alarms create unnecessary stress and anxiety, which can last long after the initial scare is over, and add costs for follow-up exams and biopsies.

Now a team led by Ross Shachter, associate professor of management science and engineering, and Daniel L. Rubin, professor of biomedical data science, of radiology, and of medicine, has used artificial intelligence to develop a system that could one day help radiologists diagnose mammograms more accurately. They describe their approach in a paper co-authored with Jiaming Zeng, currently a doctoral student in management science and engineering at Stanford; Francisco Gimenez, PhD alumnus in biomedical informatics; and mammography expert Elizabeth S. Burnside at the University of Wisconsin School of Medicine. The team used machine learning, an artificial intelligence technique, to analyze 112,000 mammography screenings done by 13 radiologists at two university medical centers. The data included detailed observations of each mammogram, as well as a description of each patient’s risk factors, and follow-up information on whether each patient actually did or did not have cancer. Through repeated analyses, the computational system gradually created a model for making a breast cancer diagnosis based on the probability that the observed features on a mammogram indicate malignancy.

The challenge in breast cancer screening is that radiologists are far more concerned about overlooking cancer than about flagging something harmless.

In medical terms, the goal is to avoid false negatives without generating too many false positives. Shachter says the most commonly accepted threshold for making a positive finding is when a radiologist decides that a tumor on a mammogram has a 2% probability of being malignant. Shachter said most of the expert radiologists at the two medical centers were even more cautious than the official guideline, with some making positive findings when the probability of a malignancy was below 1%. But a few of the experts were less cautious, and would not make a positive finding unless the probability was 3%.

Shachter said the overall objective of their study was to develop a computer model that would substantially reduce the number of false alarms without appreciably increasing the number of missed cancers. The researchers started with the absolute knowledge of what had become of the 112,000 women in their study. In 1,214 instances, or 1.1% of the screenings, a malignant tumor turned out to be present. The expert radiologists who analyzed those screenings issued 174 false negatives, meaning they missed cancers that were present at the time. However, they also diagnosed more than 12,000 false positives, or more than 10 times the number of cancers that were actually present. The machine learning model, which based its process on the features observed by the expert radiologists, compared favorably those experts’ actual diagnoses. The computational system missed 175 malignancies — in other words, it issued one more false negative than the experts — while delivering 3,612 fewer false positives.

The researchers wanted to ask one additional question: What if their model had consistently applied the commonly accepted 2% threshold to all 112,000 cases? They found that the computational system would have issued 47 additional false negatives — or 218 in total — while reducing the number of false positives by 2,300.

Shachter does not believe that artificial intelligence is poised to replace human radiologists — the job requires far too much judgment. But the analysis showed that it should be possible to build a computational system that could help mammographers reduce the number of false alarms without increasing the risk of missing cancer when it’s really there. Future work will be needed to validate this system on patients to see whether it actually improves mammographic practice, but Shachter is optimistic.

“Our approach demonstrates the potential to help all radiologists, even experts, perform better,” he says.