As the healthcare industry increasingly integrates artificial intelligence into tasks such as summarizing doctors' notes and analyzing health records, a new study led by Stanford School of Medicine researchers has unveiled a critical concern. It highlights how AI-driven chatbots may be perpetuating racist and inaccurate medical ideas, potentially exacerbating health disparities faced by Black patients. These chatbots, trained on vast internet text datasets, responded to medical questions with misleading and racially biased answers, which can have real-world consequences, leading to medical racism.
The study, published in the journal Digital Medicine, analyzed popular AI chatbots, including ChatGPT, Google's Bard, and others. These chatbots failed when asked medical questions about topics such as kidney function, lung capacity, and skin thickness, often reinforcing false beliefs about biological differences between Black and white individuals. These misconceptions, historically debunked, have contributed to medical providers misdiagnosing health concerns and providing less relief to Black patients.
The concerns raised by this study are not just theoretical. As physicians increasingly experiment with chatbots and AI language models in their work, patients are also turning to these tools to assist in diagnosing symptoms. The racially biased information generated by chatbots could directly influence patient perceptions and potentially exacerbate health disparities.
The research found that chatbots parroted incorrect information about skin thickness differences between Black and white skin and made erroneous calculations regarding lung capacity for Black individuals. Such inaccuracies have no basis in medical reality and contribute to harmful stereotypes.
Both Google and OpenAI responded to the study, highlighting their efforts to mitigate bias in their models and emphasizing that chatbots are not a replacement for medical professionals. Google recommended that users refrain from relying on Bard for medical advice.
While chatbots have shown potential in assisting human doctors with diagnoses, they are far from perfect, as their "black box" nature makes it challenging to understand potential biases and diagnostic limitations. The use of large language models in clinical settings remains an area of active exploration, with researchers and healthcare institutions looking to create more accurate and equitable AI tools.
The study's findings serve as a critical reminder of the importance of addressing biases and inaccuracies in AI chatbots and language models, particularly in medical contexts. As the healthcare industry invests heavily in AI, ensuring these tools are equitable, reliable, and bias-free is vital to improve patient care and reduce healthcare disparities.