We call for more studies exploring the implementation of AI technologies and what happens when humans and AI interact in the clinic.
Artificial intelligence (AI) is already having an impact in healthcare, from the doctor’s office to hospitals. Machine learning tools are transforming certain areas of medicine. Foundation models trained on large datasets through self-supervised learning dominate image-based disciplines, such as radiology and pathology. In this issue, Pearse Keane describes how foundation models for retinal imaging are revolutionizing the practice of ophthalmology. As Keane explains, the models bring world-leading expertise out of specialized centers and into the community.
Many, however, fear that AI will dehumanize medicine. A 2025 Harris poll of over 1,000 US physicians revealed that 58% worry about over-reliance on AI for diagnosis, and 61% are worried about a loss of the human touch. These concerns are less about AI replacing health workers — although that worry exists — and more about how AI might undermine the patient–doctor relationship, one built on trust. This is perhaps the biggest challenge for AI in medicine. Trust requires transparency, accountability and reproducibility. Often AI lacks all three.
There are many open questions about the safety of these tools and the impact they have on patient outcomes and clinical workloads — UK radiologists say that AI-assisted imaging can lead to more false positive results than unassisted screening does, which means more unnecessary tests and anxious patients. But prospective studies and thoughtful implementation are encouraging1.
The challenge is sharpest with generative or agentic AI. Since the launch of chatGPT and other large language models (LLMs), the rise of chatbots has grown unchecked. Unlike clearly defined foundation models or products regulated under Software as a Medical Device rules, the use of generative AI is broad and unregulated.
LLMs are being adopted at an accelerating pace in real-world healthcare settings, but with limited oversight. In a Comment in this issue, Shen et al. report that the release of the affordable and open-source LLM by DeepSeek, in January 2025, was followed by its rapid rollout across 750 Chinese hospitals. In China, DeepSeek has been deployed for administrative tasks and clinical decision support and operates in what is acknowledged as a ‘regulatory grey area’.
There has been a similarly fast uptake of tools that assist US physicians, from commercial AI scribes that record doctor visits to the Open Evidence search platform. The company says their platform is now used by 40% of US physicians to answer queries about treatment options or what labs to order, and it is backed by content from major medical journals. Aside from compliance with data privacy laws, most of these generative AI models are largely unregulated.
Existing regulatory frameworks that treat AI as Software as a Medical Device are insufficient for rapidly evolving LLMs. Most worryingly, there is a general lack of disclaimers for medical advice given by chatbots. A preprint study shows that disclaimers in LLM outputs for medical images and queries dropped from 26% in 2022 to 1% in 2025 (ref. 2). This is not the way to build trust, especially when patients turn to a chatbot rather than their doctor for medical advice. There are growing public concerns that LLMs are being used as unlicensed therapy chatbots3. In some cases, conversing with chatbots have led to AI-mediated delusions, suicide or even bromide poisoning.
One solution to ensuring transparency, accountability and reproducibility is to keep a human in the loop to preserve human oversight. This is crucial for high-risk areas such as clinical decision support, but as with most dilemmas facing AI, how the human is implemented in the loop matters. The science of medical AI implementation — understanding not just what AI can do, but how and when it should be used, and how it affects user behavior — is in its early days.
For instance, recent data indicate that endoscopists who used AI assistance for just 3 months saw their detection rate for precancerous polyps decline once they stopped using the tool, which suggests that continuous exposure to AI may result in physician deskilling4. Early studies are beginning to show how AI’s impact will depend heavily on how it is introduced into clinical workflows.
Deskilling, for example, can be mitigated. In aviation, another area in which life-or-death decisions are in the hands of machines and humans, pilots are forced to periodically retrain on manual controls to reduce their dependence on autopilot. So, there are potential solutions to humans’ over-reliance on algorithmic outputs. Studies in real-world settings will help to elucidate where the risks lie in the case of medicine. Beyond research, it is critical to develop a framework of accountability. Healthcare professionals want clear guidelines on who is responsible when AI makes mistakes.
We share our community’s concerns about AI impacting the uniquely human aspects of medicine. But we also recognize that AI, when thoughtfully deployed, can improve patient care. Truly supporting the human in the loop will require rigorous trials, real-world testing, and a science of implementation that treats clinicians not as passive users but as active partners.
Such studies will be critical for building human–AI systems that work to counter the biases and frailties of both the clinician and the machine. The future of medical AI is not just technical — it is relational. And it is time to study it that way.
