Why human review is key to the success of AI in health care

DTNNewsWire

3 months ago

Artificial intelligence (AI) tools are becoming more common in health care. They can read medical images, help predict risks and monitor patient conditions from afar. But AI systems can also make mistakes — especially when the data they learn from is not balanced or does not adequately represent different groups of people.

A scientist in a lab coat interacts with a glowing blue digital data stream, representing advanced technology and scientific discovery.

A new study led by UC Davis Professor Courtney Lyles stresses the importance of keeping a human in the loop to review how AI makes decisions, to help reduce bias and improve safety. The study was published in Social Science and Medicine.

Professor Courtney Lyles is the director of the UC Davis Center for Healthcare Policy and Research.

Lyles is the director of the UC Davis Center for Healthcare Policy and Research. She is also a co-founder and co-director of UC S.O.L.V.E Health Tech, an initiative involving researchers from UC Davis, UC Berkeley and UC San Francisco and private digital health companies.

In this Q&A, Lyles answers questions on AI use in health care and ways to detect and prevent bias. She also shares two examples of how UC Davis Health is building fairer and more reliable AI systems to serve patients and physicians.

What is this study about?

The study is a collaboration with Google and researchers at University of California and Northeastern University. We used a human-centered approach to critically assess explainable AI model to identify areas of bias. We formed a panel of experts in different fields to find potential factors driving bias in the AI interpretation.

Why can bias be a problem in AI health care systems?

Interpretation of AI models requires an understanding of the social and structural forces that shape health data.

Without this lens, AI systems may produce outputs that sound convincing but are incomplete, biased or unsafe. As AI becomes woven into everyday clinical care, we can’t rely on algorithms alone. Human expertise in combination with explainable AI tools become essential.

What is explainable AI and why is it important in evaluating AI models?

Explainable AI (XAI) is about understanding why the model made decisions the way it did. It provides insights by peeling back what AI is doing so we can understand how the model arrived at its determinations and predictions.

How would human reviewers assess bias in XAI models?

Our study has shown that a panel of experts from several disciplines can look closely at XAI model output and provide additional contextual interpretation of whether the results make sense in the real world. In the study, this panel included experts from medicine, epidemiology, behavioral science, engineering and data science.

The study also recommends including community members and patient advocates. Their lived experience offers insight that traditional experts may miss and can help ensure AI tools reflect the needs of the communities they serve.

This interdisciplinary framework shows how bringing diverse voices into the process makes AI not only more accurate, but more equitable, more trustworthy and more reliable.

How does an interdisciplinary panel assess XAI results?

When an XAI tool highlights why a model makes certain predictions, it often reveals patterns.

When reviewing XAI findings, interdisciplinary experts could then ask:

Could this pattern be caused by differences in the dataset?
Is this result linked to how patients interact with medical devices?
Does this reflect a social or structural issue rather than a medical one?

This process helps uncover where AI may be relying on “shortcut features” — patterns that look meaningful but actually reflect bias in the data.

How can you turn this XAI study into real-world practice?

Our work included a case study of how this interdisciplinary panel of experts reviewed real-world XAI results from medical imaging and suggested clear next steps for research and practice.

By combining technical tools with human judgment, this approach can also be used in other cases, improving accuracy and grounding results in context. In practice, you can establish teams ahead of time to gather the right types of expertise at the AI decision-making table. This improves implementation and trust between data scientists, clinicians, patients and communities.