The Spectre of Automation Bias in Radiology

In his 2023 book Decisions about Decisions, Harvard Law School professor Cass Sunstein offers this advice: Rather than concentrating on the probability of being right or wrong in a decision—which is often impossible to determine due to the intrinsic uncertainty and the unpredictability of the future—focus instead on comparing the cost of being wrong with the benefit of being right. These factors, according to Sunstein, are easier to estimate without the need for forecasting outcomes.

Applying this very logical argument to using high-quality AI tools for diagnostic medicine, we come to a straightforward problem as fleshy, fallible humans: the logical course of action is to agree with any plausible AI answer and only contradict the machine in cases of undebatable error.

If the goal is to maximize accuracy or quality, one can imagine a world where a human radiologist interprets a scan independently, and an algorithm interprets a scan independently. If both agree, then we’re done. If those two evaluations are in disagreement, then a third party—either another human or a different algorithm with different parameters—steps in to adjudicate the disagreement. (We could, of course, have that initial AI product itself be the result of a debate between multiple algorithms, but you get the idea.)

There is no guarantee that such a combination would be an improvement, but it’s a plausible outcome that will, of course, be studied. However, the effectiveness of such an approach remains uncertain. How much would such a system genuinely enhance diagnostic accuracy? Surely, it would be a moving target, but would such human-AI collaboration genuinely enhance accuracy or would it be awkward source of complexity and hamstring the needed efficiency gains. It certainly wouldn’t look very good if the third party nearly always sided with one source.

Potential Outcomes of AI-Human Collaboration

There are several possible outcomes:

  1. The human is usually right, and the addition of the AI does not create a significant change.
  2. The AI is usually right, and the addition of a human does not create a significant change.
  3. The human is usually right, but the AI results helps catch what would be unequivocal bone-headed mistakes.
  4. Both the human and the AI are usually right, but in cases where they disagree, a third-party adjudicator adds additional value by catching edge cases with higher frequency than either individual alone. If nothing else, the third party creates the system that is needed to handle discordance.
  5. Alternatively, the combination could result in overall reduced accuracy. For example, the AI is almost always right, but the uncertainty of human disagreement actually reduces the overall accuracy.

That will be studied. Yet, reality could be complex—we may find that AI’s strengths and weaknesses differ across imaging modalities, patient populations, or specific pathologies. AI may be great at breast screening but terrible at most MRI. Or the opposite. The optimal balance between AI independence and human oversight may depend on more or different variables than we’d suspect. Or not. Why pretend to know?

The Likely Commercial Model for AI in Radiology

The commercial reality is that this sort of AI utilization is unlikely to be the primary solution for handling the radiologist shortage or maximizing profitability for stakeholders unless it’s a rule. The more likely scenario is that AIs will churn out preliminary reports of increasingly high quality, which a human radiologist will review, make changes to, and ultimately be liable for.

This shifts the radiologist’s role from a thoughtful creator and analyst to more of a quality inspector—checking for plausibility rather than deeply analyzing every case. When the AI is reasonable, the human will likely agree. When the AI makes an obvious mistake, correcting it won’t require much effort from the human. Obvious contrast mixing in a pulmonary artery is not a thrombus. Calcified lymph nodes are often chronic findings, etc. A clearly benign breast lesion misclassified as a potentially malignant tumor may be easy for an experienced mammographer to catch, especially if that mammographer has access to priors and context than the AI does not

It’s easy for many observers with a vested interest to believe that their magical subset of skills will be particularly thorny to emulate, and some may even be right.

Even the quality inspector assumption presumes are relatively stable and predictable level of AI performance. How confident should a human be in their assessment when there is disagreement if the AI is improving while the human is mostly stagnant? What if AI-generated reports vary significantly in quality for different use cases? Scrutiny may be hard to employ judiciously in a piecemeal fashion.

Regulatory agencies could impose strict requirements for human oversight that make the process more labor-intensive than expected, and those requirements could be either reasonable or stupid over the short, intermediate, and long term? AI adoption will depend not only on technical feasibility but also on evolving legal, ethical, and financial pressures.

The Risk of Automation Bias

But what will radiologists do when the AI calls a focal asymmetry that the radiologist would not have called? If the AI is usually right, the human being will almost certainly just agree with whatever it says as long as it’s plausible—because the risks of agreeing are negligible, but the risks of incorrectly disagreeing are high.

How foolish will you feel calling a mammogram normal when the AI suspects a mass—with its black-box, pixel-based approach that detects patterns beyond and different from your human understanding? No one wants to get in the way, so no one will disagree and take on the liability of calling a case negative when the AI has flagged it as positive in an otherwise usually accurate system.

That’s the reality we need to live in. That’s what we’re going to see unless we specifically craft one to prevent it.

That’s going to be a big problem—because all the commercial and workforce pressure will push us toward utilizing AI tools in ways that practically ensure automation bias becomes the single biggest challenge facing radiologists in the near future.

 

Leave a Reply