Earlier this month, Microsoft published “The Path to Medical Superintelligence,” a compelling look at what it might take to build an AI system capable of matching or exceeding expert physician judgment.
Microsoft’s multi-agent framework relies on its constituent agents to individually reason and vote on the next action. It’s a smart structure, but decision-by-committee is only as fast as the slowest agent’s response. Many simple, routine queries like evaluating a dry cough can take several minutes to resolve with multi-agent frameworks. That kind of delay is a non-starter in actual asynchronous care, where timely responses drive both patient satisfaction and clinical throughput.
At Counsel, we’ve built our AI systems to minimize response latency without sacrificing medical quality. Responsiveness is foundational to usability, especially in high-volume settings, so for us, latency is not only a performance metric, but also a clinical requirement.
Clinical AI can’t be a tool that clinicians use outside of their existing workflows. Systems that require switching to a separate platform might work in academia, but won’t translate to busier patient care settings. Tools that live outside of an existing clinical workflow won’t be adopted at scale.
That’s why Counsel’s Clinician Cockpit is built as an embedded layer within our homegrown EHR. Our providers don’t need to toggle between platforms to manage patient care. Instead, all relevant patient data, such as labs, medications, imaging, symptoms, and chronic conditions, are available in a single view. After-visit notes flow directly back into the Health Information Exchange, keeping the broader care team aligned.
Microsoft’s evaluation relies heavily on NEJM challenge cases, which are complex, rare diseases intended to test the upper bounds of diagnostic reasoning. While impressive, these cases are not representative of the questions that drive daily volume in virtual care. In the real world, most patients aren’t presenting with a rare set of symptoms that would require lymphoma testing and bloodwork to get right. Instead, they’re reaching out about common, routine complaints: a cough that won’t go away, URI symptoms, abdominal pain, or medication side effects.
These are the types of cases we see most at Counsel, so our models are trained and evaluated on these real-world distributions. A system optimized for the NEJM challenge set may struggle to efficiently triage for these everyday workflows.
This study is innovative and a milestone in clinical LLM development, but when it comes to deploying medical AI in the real world, accuracy is only half the equation. Practicality, latency, integration, and benchmark alignment define whether a system will be used, trusted, and ultimately drive better patient outcomes.