Five Principles for AI in Public Service
AI enters government work through a governance architecture or it does not enter at all. Five principles — advisory, auditable, deliberative, sovereign, secure — written to be defensible at a hearing.
Every government AI pilot we have seen start in 2025 has asked the same question of itself at month three: what exactly is this thing allowed to do, and who answers for it when it is wrong?
The question usually arrives after the system has already been designed, and the designers realise that the answer they need is an architecture, not a policy. By then the rework is expensive.
These five principles are the answer we have been giving to government clients, written in a form that is defensible at a public hearing, in a caucus briefing, or in front of an auditor-general. They are the governance architecture underneath Agentic Government, and they are the reason we can put a credentialed AI system in front of a public servant without anyone pretending the thing is more than it is.
01 · Advisory Only, Never Autonomous
AI analyses evidence and recommends. Humans decide.
This is a design rule, not a policy preference. An advisory system architecturally cannot take binding action. It cannot close a file, approve a grant, deny a permit, or change a record. It can read, compare, summarise, flag, and recommend. Every consequential step is taken by a named public servant.
The reason to draw the line here — and to draw it in the architecture, not in a terms-of-service page — is that policy lines about AI autonomy decay. Under pressure to scale, the tempting path is always "just let the system do the easy ones." The easy ones are where the unexplained denials live. Architecture that cannot take action cannot take wrong action.
02 · Full Audit Trail
Every AI recommendation records the model used, the prompt applied, when it ran, and the evidence it saw. Reproducible, end-to-end.
Government work is answerable. A decision made six months ago should be reconstructible today. If an AI contributed to that decision, its contribution needs to be reconstructible too — not as a log line, but as a reproducible artefact. The same input, the same model version, the same prompt structure, the same output. Not approximately. Exactly.
This is harder than it sounds. It means versioning prompts, pinning models, capturing evidence snapshots, and storing the lot in a way the Office of the Auditor General could request three years from now and actually get back something meaningful.
03 · Multi-Agent Deliberation
Multiple independent AI perspectives evaluate each area. Consensus, dissent, and tension surface transparently for human assessors.
A single LLM making a single recommendation is an overconfident voice with a credential it has not earned. A panel of structurally different evaluators — different prompts, different roles, different evidence priorities — produces a more useful object: a set of views, with the places they agree and the places they diverge visible to the human.
The human assessor reads the dissent. If three advisors reach the same conclusion through three routes, that is a stronger signal than if one advisor said it first. If they diverge, the divergence is the finding. Consensus that hides dissent is the failure mode this principle prevents.
04 · Sovereign by Architecture
All data stays within controlled boundaries. Privacy is enforced by system design, not by policy alone.
For Canadian public-sector data this means inference and storage within Canadian jurisdiction. For other sovereigns, within theirs. The point is not nationalism — the point is that residency has to be a property of the architecture, not a checkbox in a procurement contract. Policies about data location can be overruled by a change in vendor terms. Architecture cannot.
In practice this means sovereign inference stacks, sovereign vector stores, sovereign logging, and no silent back-channel connections to foreign APIs inside an agent's tool use.
05 · Secure AI Pipelines
Role-based access control, prompt injection prevention, and security-audited agent interfaces — built in from day one.
AI systems that handle live government evidence are high-value targets the moment they exist. Treating them as web applications with an LLM glued to the side is the mistake that will make the news. The security posture is platform-grade from day one: RBAC, rate limits, prompt injection guards on every tool surface, monitored agent boundaries, and an incident-response runbook that assumes the system will be probed.
Why five
These five are the smallest set we could write that still closes the governance question. Drop any one and a different question opens in its place. Add more and they stop being quotable.
The test for whether a government AI deployment meets the public-service bar is not does it work — of course it works, the models are good now — but can you defend it in five minutes, to someone who is not technical, against a skeptical interlocutor who has read the news. These five are what that defence sounds like.
These principles are the governance architecture underneath the Agentic Government platform. The Scan → Incubate → Scale → Advise method is how we operationalise them inside live assessment programmes.
Writes in public about the receiving system, systems transformation, and the quiet patterns that decide whether change sticks. More essays in Praxis.
Continue reading
What a lab is for
A lab is not a branded room with sticky notes. It is a contained piece of a system, run under observation, for long enough to learn something that matters.
ReadNotes on the first three weeks of a system-change engagement
The most important decisions on any engagement are made before the plan exists. Here is what we try to notice in the weeks before the work is named.
ReadWorking on something like this?
