AI chatbots tested on patient questions about Lu-177 PSMA-617

Dec 5, 2024

AI chatbots such as Gemini and ChatGPT-4 have considerable potential to answer common patient questions about theranostics treatments, yet struggle with the complexity of the topic, according to a December 4 presentation at RSNA.

Specifically, researchers asked the chatbots 12 questions about lutetium-177 (Lu-177) prostate-specific membrane antigen (PSMA)-617 treatment and while they provided easy-to-understand answers, this came at the expense of accuracy, noted presenter Gokce Belge Bilgin, MD, of the Mayo Clinic in Rochester, MN.

“They generally struggled with pre- and post-therapy instructions and also side effects. For instance, both claim that the most common side effect is allergic reaction, which is not that common in clinical practice,” Bilgin said.

Since ChatGPT launched in 2022 and Gemini in 2023, the chatbots have quickly become a part of everyday life for many people and it is well known that they are changing the way people are accessing medical information, Bilgin noted. They may be good at offering instant, conversational answers to simple questions, but how do they perform on complex topics like Lu-177 PSMA-617 (Pluvicto, Novartis) therapy, the researchers asked.

In the study, the group asked the chatbots 12 common patient questions, as follows:

How does Pluvicto therapy work?
How are patients selected to be treated with Pluvicto?
Who is most likely to benefit from Pluvicto?
How can I prepare for Pluvicto?
How is Pluvicto administered?
What are the most common side effects of Pluvicto therapy?
What instructions should I receive from physicians before and after Pluvicto therapy?
How many doses of Pluvicto should I receive?
How do physicians monitor the effectiveness of Pluvicto therapy?
How soon will I know if the treatment is effective?
Where can I get Pluvicto therapy?
How much does Pluvicto therapy cost?

According to the results, ChatGPT-4 provided more accurate answers than Gemini (2.95 vs. 2.73 on a 4-point scale), while Gemini's responses had better readability than ChatGPT-4 (2.79 vs. 2.94 on a 3-point scale) Both ChatGPT-4 and Gemini achieved comparable conciseness scores (3.14 vs. 3.11 on a 4-point scale).

Additionally, the experts categorized 17% of ChatGPT-4's responses and 29% of Gemini's responses as incorrect or partially correct. Gemini's answers contained significantly more misleading information than those of ChatGPT-4 (p = 0.039), the study found.

“On the one hand, AI chatbots showed great potential for answering some questions related to treatments and could serve as a starting point for patients trying to understand their treatment options. On the other hand, their inaccuracies and potential for delivering misleading information pose significant risks,” Bilgin said.

Ultimately, the chatbots could lead patients to misunderstand their options, which could in turn lead them to poor decisions, as well as cause unnecessary anxiety, she said. In addition, new ethical concerns are emerging regarding their use, such as patient data privacy and medicolegal issues, she noted.

“AI chatbots like ChatGPT and Gemini are a promising step forward in making medical information more accessible. However, they are not yet reliable enough for standing alone for complex topics and there's still work that needs to be done to ensure accuracy, safety, and trust,” Bilgin concluded.

For full 2024 RSNA coverage, visit our RADCast.