From: A systematic review of large language model (LLM) evaluations in clinical medicine
Q1 | Based on the article provided, which medical field does this article pertain to? |
Q2 | Is the language of the article a non-English language? (yes = 1, No = 0) |
Q3 | Is an LLM or GPT mentioned in the article used for educational purposes in medical/clinical field? (yes = 1, No = 0) |
Q4 | Is an LLM or GPT mentioned in the article used for examination and evaluating purposes in medical/clinical field? (yes = 1, No = 0) |
Q5 | Is the evaluation of the LLM or GPT conducted by humans or compared with humans? (yes = 1, No = 0) |
Q6 | What is the name of the LLM(s) or GPT(s) version evaluated in the article? |
Q7 | What is the targeted group of interest for the LLM or GPT mentioned in the article (e.g., doctors, nurses, students, patients)? |
Q8 | How are the responses of the LLM evaluated? |
Q9 | What is the gold standard against which the LLM’s responses are compared? |
Q10 | What tools, scales, or set of questions are used in the evaluation, and how many questions are there? |
Q11 | What parameters are assessed to measure the LLM’s responses? |