The AI Efficiency Gap: Why Clinical Safety Frameworks Are Failing

In the space of AI deployment, a lot of mixed hype and sentiment converges to one narrative: LLMs and AI tools are useful. But useful doesn’t tell you much. It doesn’t tell you in what capacity, or what downstream risks, assessments, and mitigations need to take place to determine outcomes and endpoints.

In my field and other heavily regulated industries, we have causality assessments and frameworks specific to safety and efficacy. When we make a claim, we have to back it up with data and statistical significance. Can you imagine if a drug were put on the market and allowed to claim usefulness without a full-picture analysis of what outcomes are possible?

We do see this, actually. Off-label drug usage happens all the time. A statistician or a clinical team notices a cluster of data points related to an unexpected outcome. Word of mouth, case studies, and conferences are how these off-label usages spread among clinical teams and into deployment.

Off-label usage exists in part because the regulatory process is slow, fragmented, and takes years and millions of dollars. It is a way to circumvent the system into a grey area.

But there is a critical difference: the prescription and usage of an off-label product is under the guidance of a care team and a physician. There is still a framework in place for monitoring safety and efficacy. Sometimes — as is the case with GLP-1s — this can be taken too far in favor of hype and profit before the proper endpoints and outcomes can be studied. But even then, there is some data. There is some oversight.

Transfer the Pattern

Now transfer this to AI tools — powerful products that can synthesize patterns, speed up processes, and transform workflows. The driving metric is efficiency. The same efficiency that drives off-label adoption in medicine: the formal system is too slow, so the market moves without it.

The concern for AI and LLMs is the same, but with critical caveats. LLMs have no regulation. They have no care team or physician overseeing their usage. It’s the equivalent of a patient who decided to take a GLP-1 and just trusted it would work for their circumstance because broadly it seems safe.

But that isn’t safe to assume. Absence of data is not the same as absence of risk. We cannot treat deployment as safe simply because the data showing statistical risk doesn’t exist yet.

There is no care team, no oversight, no supervision. There are minimal studies going in either direction — on how information from clinicians and broad population groups enters and shapes training data, and on how the tools themselves affect clinical judgment and patient outcomes. We don’t know what happens in either direction.

We are in a worse position than off-label drug usage, where at least there is some data on safety and efficacy for the original indication. AI deployment in clinical contexts is proceeding without any of it.

The Efficiency Gap

Here’s where it gets structural. Even in regulated industries, the metric driving AI adoption — efficiency — is not what regulators evaluate. The FDA does not care about efficiency. If a tool or device is cleared, it is not based on efficiency. It is based on safety and effectiveness — either through substantial equivalence to an existing cleared device or demonstrated clinical outcomes.

Since the claims being made for AI tools are centered on efficiency — not safety or effectiveness, which is what regulators evaluate — we have deployment driven by a metric that falls outside of regulatory scope entirely.

The reimbursers care about efficiency. The health systems care about efficiency. And that economic pressure drives deployment of tools that have not been vetted for the outcomes that regulators exist to protect: safety and efficacy.

What’s Missing

The methodology for evaluating where an intervention can fail, how, and in whom already exists. It has existed for decades in clinical trial design, pharmacovigilance, and adverse event causality assessment. It is not being applied here.

Until it is, we are deploying tools across clinical settings without the safety and risk mitigation protocols that would be required for any other clinical intervention — and without the ability to determine whether they are replicating the harms they are meant to address.

*Kimberly Hosein, MBA

Loopwork System, LLC*