Structure vs Constitution in AI Safety |

Anthropic publishes its constitution and commendably, accompanying research about where the constitution works and where it does not. The current version (January 2026) is a long-form ethical treatise written with Claude as primary audience. The priorities are safety, ethics, Anthropic’s guidelines, and helpfulness, in that order when they conflict. There are also hardcoded absolutes and softcoded defaults that operators can adjust. Anthropic favours cultivating good values and judgment over strict rules, comparing their approach to trusting an experienced professional rather than enforcing a checklist.

I develop the Perseverance Composition Engine, an open source multi-agent AI system that takes a different approach (called Artificial Organisations) to the same problem. PCE assumes the agents cannot be relied on to be honest/harmless/helpful/etc and structures the system so that the inevitable bad behaviour doesn’t surface.

The PCE pipeline works by sending a document through four independent agents: a Composer drafts from source materials, a Corroborator fact-checks the draft against those sources, a Critic evaluates the result without seeing the sources, and a Curator files the output. Each agent has a single objective, minimal permissions, and access to only the information it needs. The Critic can’t see the sources, so it can’t rationalise away a weak claim by pointing to them. The Composer can’t see the evaluation rubrics, so it can’t game them.

The contrast with constitutional AI comes down to where you locate the safety mechanism.

Constitutional AI locates it in the agent. Train the agent well enough, give it clear principles, and it should behave. The problem is that agents under pressure — optimising for token velocity, operating in unfamiliar domains, balancing conflicting objectives — still confabulate, still produce plausible nonsense, still find locally convenient solutions that technically satisfy the rules while violating their spirit.

PCE locates safety in the structure around the agents. The Corroborator has sources in front of it and just one job: find discrepancies. If the Composer invented a claim, the Corroborator will see the absence in the sources. The Critic evaluates the output against rubrics without knowing what the sources said, so it can’t excuse a vague passage by noting the sources were thin. Three independent agents would all have to make the same mistake in the same direction for a fabrication to ship. That’s much less likely than one agent hiding its own error from itself.

The constitutional approach asks agents to balance honesty, harmlessness, and helpfulness simultaneously — a three-objective optimisation problem with no clear priority order. The objectives frequently conflict so the agent must find a trade-off in real time. In practice, this produces outputs that satisfy all three criteria superficially: plausible, inoffensive, and vaguely on-topic. PCE resolves the conflict structurally. The Composer worries about coherence, the Corroborator worries about truth, the Critic worries about quality. Each agent is single-minded, and conflict resolution is done by the pipeline.

In consequence PCE inherits every improvement to the underlying models, and better alignment is always nice. But PCE doesn’t require well-aligned agents. I regularly put a weaker or less aligned model in a PCE role and the structure still prevents fabrication from reaching the output. The Composer doesn’t need to be trustworthy; it needs to produce coherent text from sources. The structure does the safety work.

A constitutional agent deployed inside a structural pipeline gets the benefit of both. Good training reduces the load on the verification stages — fewer errors to catch means faster throughput and lower cost. And structural constraints catch the cases where training fails, which it sometimes does regardless of how good the training is.

I find it refreshingly helpful to view the safety problem as a problem of institutions. We have had millennia to refine our knowledge that reliable collective behaviour comes from structure not from hoping that individuals will be virtuous. Examples of structure are separation of powers, independent audit and role specialisation, and the technical name for this is an information partition and is well understood: Weber wrote about role specialisation and separation of duties in bureaucracies, Parnas about information hiding in software systems, while March and Simon gave us bounded rationality where each role has only the information relevant to its function.

PCE applies these ideas to LLM agents, and it works rather well.