Abstract
The use of large language models (LLMs) is increasingly common. However, LLMs may exhibit sycophancy, echoing users' beliefs while avoiding contradiction. In the present study, we describe sycophancy in general-purpose LLMs when applied to orthopaedic contexts. We investigated sycophancy in 2 general-purpose LLMs. We evaluated performance on 3 tasks: (1) accuracy on benchmark answering: LLMs were tested on validated benchmark orthopaedic questions, with correct and incorrect cues, and the change in accuracy and sycophancy error rate were determined; (2) user belief agreement: LLMs were provided with ambiguous statements and a user belief, and LLM agreement, contradiction, and uncertainty were described; and (3) false information detection: false information was placed within a task prompt to measure noncontradiction and propagation rates. Baseline factual accuracy on benchmark questioning was 78%, decreasing with correct hints (71%) (p = 0.49). With incorrect hints, LLM accuracy declined significantly (48%) (p < 0.001), with a sycophancy error rate of 52%. Presented with user beliefs about an indefinite, controversial statement, models echoed user beliefs in 56%, expressed uncertainty in 12%, and contradicted users in 32% of statements. In noncontradiction tasks, models perpetuated incorrect attributions 99% of the time yet reliably corrected statistical distortions 97% of the time. Although popular general-purpose LLMs have useful orthopaedic applications, they exhibit sycophancy, with a tendency toward agreement and without recognition of ambiguity. This is a key weakness to be addressed. Findings should be interpreted cautiously given the variability in model design, prompting, and models evaluated. The tendency of general-purpose LLMs to agree without recognizing clinical ambiguity may limit their reliability in orthopaedic applications.
Preview Vancouver citation
Perry AJ, Kalva S, Fucich D, Muppidi S, Aggarwal M, Virk MS, et al. Current Artificial Intelligence Large Language Models Exhibit Sycophantic Behavior in Orthopaedic Contexts. J Bone Joint Surg Am. 2026 May. doi:10.2106/JBJS.25.01576. PMID: 42166556.
Metadata sourced from the U.S. National Library of Medicine (PubMed). OrthoGlobe curates but does not host the full-text article.