Leaderboard

Explore how medical foundation models perform across DoctorBench dimensions, with multi-dimensional sorting and filtering. Last updated: -

DoctorBench-LLM Medical LLM Leaderboard

Evaluates medical large language models across the full clinical practice workflow. It covers core tasks including in-depth symptom analysis, personalized diagnosis and treatment planning, multimodal report inference, and medical safety guardrail monitoring, systematically assessing models' reasoning granularity and decision accuracy when handling real patient chief complaints and complex medical data.