AI chatbots can generate high-quality, empathetic, and readable responses to cancer patients' questions compared with physicians, according to an equivalence trial published May 16 in JAMA Oncology.
The finding is from a group at the Princess Margaret Cancer Centre in Toronto, Ontario, who evaluated the competency of three AI chatbots (GPT-3.5, GPT-4, and Claude AI) to generate responses to patient questions about cancer compared to responses by physicians sourced from an online forum.
“If carefully deployed, chatbots can serve as a useful point-of-care tool for providing digital health care and information to vulnerably situated populations without accessible clinical care,” noted study lead Srinivas Raman, MD, a radiation oncologist.
AI chatbots pose the opportunity to draft template responses to patient questions, the authors suggested. However, the ability of chatbots to generate responses based on domain-specific knowledge of cancer remains undertested, they noted.
To that end, the group first collected a random sample of 200 unique public patient questions related to cancer and their corresponding verified physician responses posted on Reddit r/AskDocs from January 2018, to May 2023. They then generated responses from the three AI chatbots to the same questions, with an additional prompt to limit word length to 125 (the mean word count of the physician responses).
The primary outcomes were pilot ratings of the quality, empathy, and readability on a Likert scale from 1 (very poor) to 5 (very good) based on evaluations from two teams of attending oncology specialists.
The responses generated by all three chatbots were rated consistently higher in mean component measures of response quality, including medical accuracy, completeness, and focus, as well as overall quality, compared with physician responses, according to the results.
Specifically, Claude AI was the best-performing chatbot, with its 200 responses rated consistently higher than physician responses in overall measures of quality (3.56 versus 3.00), empathy (3.62 versus 2.43), and readability (3.79 versus 3.07).
“The superior empathy of chatbot responses may encourage future investigation of physician-chatbot collaborations in clinical practice with the hope that chatbots can provide empathetic response templates for physicians to edit for medical accuracy,” the authors suggested.
The authors noted limitations of the study, namely that it used isolated textual exchanges on an online forum to model physician-patient interactions and that it lacked patient raters to measure response empathy from a patient perspective.
Nonetheless, the study benchmarks the competence of AI chatbots in responding to oncology-related patient questions, they noted.
“Further research is required to investigate the implementation of AI chatbots into clinical workflows with consideration of chatbot scope, data security, and content accuracy in the age of digital health care,” the authors concluded.
The full study can be found here.