18 Dec 2025 10 min read

GenAI Product Doctrine: Products of Consequence

This guide is for teams adding agentive capabilities to existing products. Often, when people think about adding 'AI' to a product, it's a chatbot. They make context available to the chatbot and expect the customer to interact with it. It's a low-effort way to have 'AI' in your system.

What you are building is a tool that allows agents to accomplish the goal of your product with human supervision. Not humans doing it with agent support, agents accomplishing the goal defined by your product. Whatever it is that your product does, whatever value it provides, agents accomplish it while humans supervise, manage, and remain accountable for the result.

Core Invariants

Six principles govern this work:

Agents must do the work.
Humans must supervise the work.
Humans are accountable for outcomes.
Supervision requires live visibility.
Trust is earned, scoped, and revocable.
Human actions always take precedence.

If your product violates any of these, it will fail in predictable ways. The rest of this doctrine explains why these invariants hold and how to build products that enforce them.

What an Agentive Product Is

An agent acts. It creates, modifies, commits, submits, and triggers. It does real work with real consequences.

If your "agent" only answers questions or generates text for a human to copy and paste somewhere else, you haven't built an agentive product. You've built a tool. Tools are fine. This guide isn't about tools.

There are two kinds of products in an agentic ecosystem. Products with consequences, where actions happen: contracts get signed, money moves, code ships, patients get treated, and cases get filed. Someone is accountable for outcomes. These are agentive products.

Then there are products that inform consequences. They provide inputs to the products where actions happen. A data provider, a research tool, an intelligence platform. They surface information but don't act on it. A sales automation system or underwriting platform might consume that data, then act.

Both matter. But the arbiter model applies to the first kind. If you're building a product with consequences, you need everything in this guide. If you're building a product that informs consequences, your job is different: accuracy, provenance, clarity. You're an input to someone else's arbiter system.

In products with consequences, execution is the interface.

This guide is for products with consequences.

Start From First Principles

Most teams approach this wrong. They look at their existing product and ask, "Where can we add AI?" They bolt agents onto existing UI, existing flows. They get shallow integrations that don't change anything fundamental.

That's backwards.

Stop looking at features. Look at the invoice. What is the user actually paying for? That outcome is the agent's job. The process is the agent's implementation detail.

Your existing UI is chrome around single-threaded human action. A person doing one thing at a time, clicking through screens, filling out forms, and making decisions sequentially. That's not what you're building anymore. You're rethinking the product when the human's role shifts from operator to supervisor.

If you don't change the user's role, you haven't changed the product.

The Two Traps

Teams building agentive products fall into one of two failure modes. Both come from not rethinking the product around the actual outcome.

The first is automation fantasy. The agent does everything autonomously. Trust it. This fails because no one is watching and no one can intervene. It also fails on liability. Users are responsible for outcomes they couldn't see or control. When the agent sends the wrong email, signs the wrong contract, or ships the wrong code, someone answers for it. That someone had no visibility and no ability to stop it. That doesn't work.

The second is copilot theater. The agent suggests, the human executes. This is low leverage. The human bottleneck remains. You've made the existing thing slightly faster, not changed what's possible. A human still reads every suggestion, decides whether to accept it, and performs the action. The agent is a fancy autocomplete. This path commoditizes fast because everyone can build fancy autocomplete. There's no moat in suggestions.

The way out is neither full automation nor assisted manual work. It's a different model entirely.

The Arbiter Model

Agents act. Users supervise.

This isn't philosophy. It's how responsibility actually works. When something goes wrong, a human is accountable. If they're accountable, they need visibility and control. Otherwise, you've created responsibility without authority. That's an organizational and legal failure mode.

The user shifts from operator to arbiter. They're not doing the work. They're ensuring the work is done correctly. That's a different job with different requirements.

The hard part is that reviewing is harder than doing. It requires holding intent in your head while evaluating someone else's execution. Most people aren't trained for this. They're trained to do the work themselves. Supervision is a skill, and your users may not yet have it.

Your product's job is to make the supervisory role tractable. Design for reviewability, not just capability. The question isn't "what can the agent do?" The question is "can the user verify what the agent did, quickly enough to matter?"

One principle makes reviewability concrete: never show state without showing delta. If an agent reconciles a thousand transactions, the arbiter shouldn't see a thousand green checkmarks. They should see the three fuzzy matches the agent was 80% sure about. The state is "reconciliation complete." The delta is "here's where I need your judgment." Surfacing the delta is how you make supervision tractable. Surfacing only the state is how you create rubber-stamp approval theater.

Give Agents Agency

Agents must act. Actually do things, not suggest things.

If the agent can only recommend and the human must execute, you're back to copilot theater. The human is still the bottleneck. Agency means the agent can create, modify, delete, trigger, or submit. Whatever the real operations are in your domain, the agent can perform them.

Risk isn't managed by limiting agency. It's managed by supervision and control. An agent that can't act is safe but useless. An agent that acts without supervision is dangerous. The goal is supervised agency.

Give Users Live Visibility

The user sees what the agent is doing in real time. Not a summary afterward.

Live visibility is the supervision surface. The user knows what the agent is doing now, what it has done, and what it is about to do. The interface surfaces this front and center, not buried in logs.

An agency without visibility is a danger. Visibility without agency is theater.

The Trust Gradient

All actions start requiring approval. The agent proposes, the user approves, the agent executes. Full supervision.

This is how trust develops. The user watches the agent work. They see what it does well, where it stumbles, and what kinds of decisions it makes. Over time, they selectively loosen the reins. These kinds of actions, just do them. These others, still ask me.

Trust isn't declared up front. It's earned through observed behavior and granted incrementally. This matches how you'd manage a new employee. You check their work. As they prove reliable in specific areas, you stop checking those. Other areas you keep checking indefinitely, maybe forever. The employee earns autonomy through demonstrated competence, not through seniority or credentials.

The product needs to support this gradient. Default to approval required. Let users grant autonomy to specific action types. Remember those grants per user, probably per context. Make it easy to revoke when trust breaks down.

Early use is slow. The user is approving everything. That's how trust builds. The product should make approval fast and observation easy.

The agent must provide leverage during the supervision phase, not just the autonomous phase. If supervising the agent is harder than doing the work, the user will never reach the point of granting autonomy. Even at full supervision, the agent should structure decisions, surface relevant context, and tee up actions. The user approves, but the agent has already done work worth approving.

The trust gradient only works because constraints make behavior predictable. More on that shortly.

Give Users Control

The user can intervene. Stop, adjust, approve, reject, and take over.

The goal is not perfect output. It's cheap course correction. If correction is cheap, users intervene early. If it's expensive, they defer until a restart is the only option.

Control shouldn't feel like re-specifying from scratch. It should feel like sculpting. The user reaches in, makes an adjustment, and the agent continues with the correction incorporated.

If the agent produces a ten-step plan, the user can reorder steps, remove one, adjust a parameter, all without starting over. The structure is malleable. The user shapes it rather than accepting or rejecting it wholesale.

Corrections are inputs, not stop signals. The best agents treat a user adjustment as a new constraint. "I see you changed X. I've updated Y and Z to match." The user stays in the supervisory role. Work continues.

The weak version is an agent that stops and waits for complete re-instruction whenever the user touches anything. Now the user is back to being an operator. Every intervention resets the collaboration to zero. That's micromanagement with extra steps.

One more requirement for products with consequences: revert is distinct from intervene. Intervention happens before commit. But what happens after the agent sends money, ships code, or files the document? Sculpting is not enough. You need a time machine. Agentive products require compensatory action capability. If an agent makes a consequential mistake, the user needs a path to roll back the state, not just fix the output going forward.

Design for Multiple Lines of Action

If agents act and users supervise, a single user can supervise multiple agents or workstreams. The job becomes attention allocation. Where do I need to look? What needs me right now? What can I ignore?

Your existing UI was built for single-threaded human work. That's not the model anymore.

The arbiter interface asks different questions. What's happening across all active work? What needs attention right now? Where can I drill in when something requires closer inspection? How do I get back to the overview when I'm done?

This is a supervision surface, not a task interface. The old UI is still available when the user needs to go in deep on a specific action or workflow. But it's not the primary surface anymore. The primary surface is the one that lets a single human keep track of many things at once.

Escalation Is a Feature

Agents should know when they're uncertain. They should halt, flag, and ask rather than proceeding with false confidence.

This isn't failure. Silent failure is failure. Escalation is the system working correctly. An agent that says "I'm not sure how to handle this case, here are the options I see" is more valuable than an agent that guesses and gets it wrong.

Good escalation has specific properties. The agent stops before acting on uncertainty, not after. The question to the user is specific, not open-ended. The user can answer and resume, or take over entirely.

If agents only escalate after mistakes, users spend their time fixing problems rather than preventing them. If agents escalate before mistakes, you've built a collaboration. Users spend their time making decisions at the moments that matter.

Constraints Build Trust

Agents are more useful when they are more constrained. The temptation is to show capability. Look how much our agent can do. That's demo thinking.

Real value comes from narrow scope, predictable behavior, and repeatable outcomes. The user learns what the agent will and won't do. Trust builds through consistency. An agent that can do anything is an agent you have to watch constantly. You never know what it might try. An agent with clear boundaries is one you can supervise efficiently. You know its scope, so you know what to watch for.

Constraints also enable the trust gradient. Users can only grant autonomy to actions they understand. If the agent's capabilities are unbounded and unpredictable, the user can't reason about what they're approving. Narrow scope makes trust decisions tractable.

Architectural Consequences

The UI is no longer the source of truth.

To deliver agency, live visibility, and control, you need to build up your service layer. Your APIs are the agent's user interface. The agent doesn't click buttons or fill forms. It calls operations. If your product doesn't have an internal API that can handle concurrent actors, you cannot build an agentive product. This isn't a UX consideration.

Product managers often think the UI is the product. In an agentive world, the API is the product. The UI is just a view layer for the human supervisor. The agent calls operations. The user sees and can invoke those same operations when they need to intervene. Your existing UI is chrome on top of those operations, but it's not the supervision surface.

This isn't the starting point. It's a consequence. But it's real. Agentive products often require exposing an operational layer that was previously internal. Both agent and human work through it. The agent isn't using your UI. The agent is calling your services. And now the user needs access to those services too, in a form they can understand and operate.

You're also likely building real-time state synchronization. If the agent is acting while the user watches, changes need to appear live. The agent is a concurrent actor, like a collaborator in a multiplayer document. If your architecture assumes a single actor, you have work to do.

One rule is absolute. In any race condition between silicon and biology, biology wins. When the user and agent act on the same thing simultaneously, the user's action takes precedence. Always. The agent yields, incorporates the user's action as a rigid constraint, and adapts. No exceptions, no race conditions where the agent got there first. If you can't guarantee human precedence, you don't have an arbiter system. You have two competing operators.

The Litmus Tests

Two questions to evaluate whether you've built an agentive product correctly.

First, governability. When the agent does something wrong, does the user know where to look and what to do, without leaving the surface where work is happening? The user delegates to the agent, then scrambles to figure out what happened when things go sideways.

Second, leverage. Is the cost of fixing the agent's mistake less than the cost of doing the task manually from the start? The agent creates work rather than eliminating it.

A useful way to quantify this: even when the agent fails, does the human taking over get at least a 50% head start? If the correction time exceeds the manual time, your product is a liability. The goal is that a failed agentic run still leaves the human better off than if they'd started from scratch. Partial progress, structured context, decisions already surfaced. That's leverage even in failure.

You need both. Governability without leverage means you've built a system that's easy to supervise but not worth supervising. Leverage without governability means you've built a system that's powerful but ungovernable. The agent runs ahead, the user can't keep up, and eventually something breaks badly enough that trust collapses entirely.

The products that get both right are the ones that will last. The rest are impressive demos waiting to be rolled back.

We are moving from software as a tool to software as a teammate. You wouldn't hire a teammate who refuses to tell you what they're doing, ignores your corrections, and has no clear job description. Don't build your AI products that way either.

This article is part of the GenAI Product Doctrine series.

Start with the taxonomy to find where your product fits: A Product Taxonomy for the Agentive Age →

If your product informs consequences rather than creating them, read Products of Judgment →

If your product advises humans directly, read Products of Counsel →

If you're trying to figure out where your product sits and what it depends on, read The Agentive Product Stack →