Innovation That Lasts: Building Accountability Into Federal AI Review

May 18

What a Smart AI Executive Order Looks Like

The White House is now weighing one of the most consequential AI policy moves in recent memory: an executive order that would formally vet advanced AI with a process modeled on the FDA's drug approval framework.

As the federal government embeds AI into high-risk decisions that determine things such as whether Americans get healthcare, housing, or other benefits, there has to be accountability.

When something goes wrong, one question is going to cut through every technical explanation and policy document: Who answers for the mistakes? Not which algorithm produced it. Not which federal standard was applied. That's not a regulatory question.

Done right, this executive order could establish a national safety floor that protects the public and creates more predictable operations for everyone deploying AI. Done wrong, it could paper over the real risks of AI while simultaneously continuing to block states from setting safety floors.

The Obvious Tension

Any honest conversation about this proposed order starts with a contradiction. The December 2025 executive order directed federal agencies to challenge state-level AI regulations that essentially told states to stand down and push a "minimally burdensome national standard". California, North Carolina, and others that had moved to protect their residents were put on notice that Washington would fight their efforts.

Now, just months later, the administration is signaling that powerful AI systems do need structured review before they are sold to governments and customer-facing companies. Any new executive order will need to reconcile national mandates and existing state regulations, or it will face serious legal and political headwinds.

The FDA Model Has Promise, With Serious Caveats

The instinct behind the FDA model is sound. Federal drug approval exists because the drug industry did not protect the public from dangerous drugs, including Thalidomide – a sleeping pill that resulted in the birth of children with no limbs. The same logic applies to AI systems deployed in benefits administration, child welfare, criminal justice, and healthcare. If we’re going to rely on a model to inform decisions about Medicaid eligibility or flag a family for child protective services review, there needs to be evidence it works, evidence it is safe, and there needs to be a process for pulling it back when it harms the public.

But the analogy quickly breaks down if the vetting framework is designed primarily around national security and industry competitiveness, which appears to be the current framing. A model that clears a federal security review is not automatically safe for a state agency caseworker relying on it to manage a 200-person SNAP or Medicaid caseload. The criteria have to match the use case, and right now, there is no clear indication that state and local government welfare program administration will be part of the vetting calculus.

The Automation Temptation

There's an understandable pull toward full automation in government. Caseloads are crushing. Budgets are tight. AI promises faster processing, fewer errors, and more efficient delivery of services. But there is a difference between AI that determines eligibility and AI that makes eligibility policy, and any executive order worth its salt needs to demarcate that line clearly.

When AI systems are used to determine eligibility subject to federal and state policy, assess contractor performance, or route casework, the consequences of an error have real consequences. They fall on actual people – often the most vulnerable residents a state serves. A federal vetting process that checks a model for national security risks but says nothing about how it performs in determining whether a rural county's Medicaid population is subject to work requirements is not a safety floor. It's a rubber stamp with federal branding.

What "Human in the Loop" Actually Means

For any vetting framework to be meaningful in the state government context, it must set enforceable standards for human oversight, not just recommend them. That means a qualified individual with decision-making authority must be in a position to review AI-generated outputs. While it may not be feasible to adequately review every individual AI decision, there must be a clear, facile, and transparent process for human intervention and engagement. Likewise, there must be a fully documented record of all reviews. It means that person has the time, training, and resources to push back on what the system recommends. And it means there is a documented record of that review.

It does not mean a checkbox at the end of an automated process,or a supervisor signing off on 200 AI-generated decisions in a single afternoon. If the new executive order is going to function similar to a drug approval regime, it needs to behave like one. This demonstration of safety and efficacy in the actual conditions of use, not just in a controlled lab environment before the product ships.

The Knowledge You Cannot Upload

Here is something that rarely surfaces in the policy debate: the single greatest asset a state agency has is institutional knowledge, and that knowledge lives in people.

It lives in the contractor who has managed a state Medicaid billing system across three administrations and knows exactly why a particular county's data looks the way it does. It lives in the full-time employee who has built relationships with providers, advocates, and legislative staff over a decade and can recognize a data anomaly. It lives in a workforce that has seen policy cycles come and go and understands the policy decisions behind the numbers any AI system will analyze.

An executive order that accelerates AI adoption without protecting and leveraging that institutional knowledge would effectively hollow-out government rather than modernizing it. The contractors and public employees who train, monitor, audit, and correct AI systems are not a line item to cut once automation arrives. They are the quality control layer that makes automation safe. A federal vetting process should account for that, not assume it away.

What CAMI Wants to See

For this executive order to serve the public interest, it should address several core elements. First, vetting criteria must include real-world state and local deployment and testing, not just national security and commercial use cases. The order should also protect the authority for states to layer on additional protections where federal standards fall short. Clear standards for human review must be part of the standard. The institutional workforce of both state employees and contractors with deep mission expertise must be explicitly recognized as essential infrastructure for responsible AI deployment. Lastly, a meaningful audit and accountability structure must exist so that when AI-assisted processes produce bad outcomes, there is a clear pathway for review, correction, and remedy.

None of this is anti-innovation. These are the conditions that make innovation durable. An AI model that clears federal review and then fails the families relying on state services is not a win for anyone. The administration has a real opportunity here to set a standard that is serious, practical, and lasting. The question is whether the framework they're handing out is worth building on.

Team CAMI

Innovation That Lasts: Building Accountability Into Federal AI Review

Beyond the Compliance Checkbox: Making the Most of CMS's Provider Revalidation Directive