MedFuzz: Exploring the robustness of LLMs on medical challenge problems
Large language models (LLMs) have achieved unprecedented accuracy on medical question-answering benchmarks, showcasing their potential to revolutionize healthcare by supporting clinicians and patients. However, these benchmarks often fail to capture the full complexity of real-world medical scenarios. To truly harness…