In January 2024, Google released the full results of its Articulate Medical Intelligence Explorer (AMIE) study, which tested whether LLMs could support diagnostic interviews under asynchronous physician oversight. The study offers one of the clearest validations to date of conversational diagnostic AI in clinical workflows.
The findings were impressive. AMIE outperformed both early-career physicians and experienced nurse practitioners across a range of simulated primary care scenarios. It produced more accurate differential diagnoses, more appropriate clinical management plans, and patient messages rated higher than those written by humans. Most importantly, oversight physicians consistently preferred reviewing AMIE-generated cases, and composite quality scores were highest for the AI-assisted workflow across all groups.
What these results show is that with the right context engineering, LLMs can already serve as powerful physician augmentation tools that support structured physician–patient dialogue. At Counsel, that finding feels deeply validating because it mirrors what we’re already seeing.
Counsel Health is now live, but for the past two years, we’ve been building what Google demonstrated in its AMIE simulations.
Like AMIE, our system runs under physician oversight. We also call ours the Clinician Cockpit, and it’s similarly centered around the SOAP note; much like AMIE proposed, we focus on capturing complex patient histories during intake, and then generate tailored patient messages for physician review.
Our system is already operational with real patients, where we ingest complex clinical records from electronic health records into our clinical guidance generation. Every day, Counsel physicians supervise structured asynchronous encounters and make clinical decisions that impact our patients, including which tests to order, which specialists to involve, and when to escalate to higher levels of care. Google validated their Cockpit through a participatory study with 10 outpatient physicians, and we have further validated this across thousands of live encounters.
We also share Google’s commitment to safety and the responsible use of AI. AMIE’s standout architectural feature is its strict separation between intake and medical advice, deferring final decisions to a licensed physician. Counsel’s system follows the same principles. We found that this decoupling led to a 79% improvement in physician efficiency and a 41% reduction in time to clinical resolution. The AMIE study found a similar effect: oversight physicians spent about 40% less time than in traditional consultations.
What the AMIE research team has done is essential. They created a rigorous, externally validated framework for LLMs in clinical settings with clear rubrics and benchmarks for measuring quality and safety. This work sets an important standard for how conversational diagnostic AI should be evaluated before and during real-world deployment. It’s a valuable step forward for the field, and this work represents a meaningful step forward for responsible AI adoption in healthcare.
The study also underscores why our work at Counsel matters. While AMIE defines what’s possible in simulation, we built the infrastructure to support it in production. We’ve been heads-down building what we believe is the future of care: AI-powered, physician-led, asynchronous medicine that scales without compromising quality. AMIE’s results validate that vision, and we’re excited to continue expanding the system that defines the next era of clinical care.
Google Research. AMIE: A research AI system for diagnostic medical reasoning and conversations. https://research.google/blog/amie-a-research-ai-system-for-diagnostic-medical-reasoning-and-conversations/
Google Research. AMIE gains vision: A research AI agent for multimodal diagnostic dialogue. https://research.google/blog/amie-gains-vision-a-research-ai-agent-for-multi-modal-diagnostic-dialogue/
NIH. Towards conversational diagnostic artificial intelligence. https://pubmed.ncbi.nlm.nih.gov/40205050/
NIH. Improving diagnosis in health care. https://www.ncbi.nlm.nih.gov/books/NBK338586/

Tony Sun is a health AI researcher and clinical informaticist with a PhD from Columbia University’s Department of Biomedical Informatics. His work focuses on fairness in transformer-based models and real-world machine learning in healthcare. Previously, he interned at a health-tech unicorn and served as a postdoctoral researcher at NewYork-Presbyterian, translating advanced machine learning research into scalable, clinically meaningful systems that support patient care.

Dr. Muthu Alagappan is a physician and former AI researcher, and the Founder and CEO of Counsel Health. He is pioneering an AI-enabled virtual care model that delivers immediate, personalized medical guidance by integrating advanced medical AI with in-house physicians. Previously, he served as Chief Medical Officer at Notable Health and as an Attending Physician at Massachusetts General Hospital and UCSF. He earned his BS and MD from Stanford University, where he researched healthcare and artificial intelligence.
Our content is created for informational purposes and should not replace professional medical care. For personalized guidance, talk to a licensed physician. Learn more about our editorial standards and review process.