The first standardized multimodal benchmark for lung cancer clinical decision support, built from 1,000 real-world clinician-labeled cases across 10+ hospitals.
LungCURE formalizes three oncological precision treatment (OPT) tasks that span the full clinical workflow for lung cancer diagnosis and treatment. All tasks share the same multimodal patient input.
Given multimodal clinical data (imaging reports, pathology reports, clinical records, supplementary materials), predict the complete TNM pathological stage. A prediction is correct only if the T, N, and M stages are all correct.
Given multimodal inputs and the ground-truth TNM stage as a conditioning signal, generate a guideline-compliant treatment plan. Tests the model's treatment planning capability independent of staging accuracy.
Given only the multimodal clinical materials (no staging input), generate a complete clinical decision. Mirrors real-world deployment where a patient uploads records and receives full decision support without manual staging intervention.
1,000 real-world multimodal clinical cases collected from 10+ hospitals in China (2019–2025), fully de-identified and approved by the Ethics Committee of Peking Union Medical College Hospital.
Sourced from 10+ hospitals across China for geographic and clinical diversity
Imaging reports, pathology reports, clinical records, and genomic testing results
Two-stage annotation protocol by senior clinicians with evidence-based TNM staging
Supports both Chinese (ZH) and English (EN) evaluation for cross-lingual assessment
Fully de-identified; all patients provided informed consent before enrollment
Gold standards based on AJCC, NCCN, and CSCO clinical oncology guidelines
Data Collection Pipeline
Imaging, pathology, clinical records, and gene testing data from hospitals
Privacy de-sensitization and unified management of case files
Interdisciplinary team annotation with expert TNM staging and quality control
Performance of state-of-the-art MLLMs on LungCURE-Core. ZH = Chinese, EN = English. Click column headers to sort. ⚡ = with LCAgent plugin.
A simple yet effective multi-agent plugin that boosts MLLM performance on LungCURE by decomposing the clinical pathway into structured, guideline-grounded stages.
Three concurrently executed agents independently extract T, N, and M evidence; a deterministic rule-based node aggregates the final stage, eliminating stochastic errors.
Decision variables map to a structured feature vector that dynamically activates a scenario-specific expert agent under locally injected guideline subsets as hard constraints.
Strict decision boundaries between stages prevent reasoning errors from propagating across the clinical pathway — a core failure mode of direct MLLM prompting.
Model-agnostic: consistently boosts Qwen3.5-397B, Kimi-K2.5, GPT-5.2, and more in a plug-in way with gains across all three tasks.
@misc{hao2026lungcurebenchmarkingmultimodalrealworld,
title={LungCURE: Benchmarking Multimodal Real-World Clinical Reasoning for Precision Lung Cancer Diagnosis and Treatment},
author={Fangyu Hao and Jiayu Yang and Yifan Zhu and Zijun Yu and Qicen Wu and Wang Yunlong and Jiawei Li and Yulin Liu and Xu Zeng and Guanting Chen and Shihao Li and Zhonghong Ou and Meina Song and Mengyang Sun and Haoran Luo and Yu Shi and Yingyi Wang},
year={2026},
eprint={2604.06925},
archivePrefix={arXiv},
primaryClass={cs.MM},
url={https://arxiv.org/abs/2604.06925},
}