ChatGPT-4 in Healthcare: An Assessment of Quality and Finetuning

Henrique Araujo Lima of the Universidade Federal de Minas Gerais in Brazil will develop a tool to systematically assess the accuracy and clarity of responses generated by Large Language Models (LLMs) to common questions on maternal health to increase their value in settings with limited healthcare access. To improve LLMs, it is essential to ensure the information they provide is both reliable and understandable, and for purposes such as health, LLMs will only be successful if both healthcare providers and users are confident about their benefits. They will collect the most common types of questions about maternal health in English, Portuguese, and Urdu, and submit them to the LLM. The quality of the answers will then be evaluated by medical experts from the U.S., Brazil and Pakistan, and the readability of the answers will be evaluated by individuals and a software model.

More information about Catalyzing Equitable Artificial Intelligence (AI) Use