Agents Are Not Accountable
Engineering is an outcome-focused discipline, and the outcome has not changed in the entire history of the profession. It is not changing now.
What changes, and has always changed, is the tooling. Assemblers replaced hand-written machine code. High-level languages replaced assembly. Version control replaced tarballs. CI replaced manual testing. Code review replaced solo authorship. Each transition forced a new discipline to keep the outcome stable, because each new tool moved some part of the work to a new place and exposed a gap the old discipline had quietly filled. Engineers who were good at the outcome adopted the new tool and built the new discipline. Engineers who confused the tool with the outcome got left behind, in both directions. Some refused the new tool and shipped slower work in a steady decline. Some adopted the new tool without the new discipline and shipped faster work that did not hold up.
Agents are the next tool. The industry is currently sorting itself into two camps, and both are making the same mistake that previous transitions produced, in opposite directions.
The first camp says agents replace engineers. Not augment. Replace. The loop closes. Intent goes in one end, production systems come out the other, and the humans who used to sit in the middle come out with the rest of the slack. Engineering as a job category disappears the way switchboard operators disappeared.
This camp is loudest among people who do not engineer for a living, but the serious version comes from inside the profession. Engineers, often very good ones, who have looked at the trajectory and concluded the work they spent twenty years getting good at is ending. They are not wrong about what they are seeing. The ratio of code-typed to code-shipped is collapsing. Skills that took years to develop are being commoditized in months. They are wrong about what it means. They have confused the tool with the outcome. The tool is changing. The outcome is the same.
The second camp says agents are a hazard. The output is unreliable. The dependence is corrosive. The craft is being eroded by people who took the shortcut. Each charge is true in some cases and overstated in others, but the charges are not the position. Underneath the argument is a fear that is itself reasonable, that something the speaker valued is being taken away by a force they did not consent to and cannot stop. The fear is real, but fear is not a strategy. It produces a stance of refusal that has the texture of integrity and the function of a holding action against a tide that is not receding.
This camp has made a symmetric mistake. They believe the discipline cannot survive the tool change. It can. It has, every time before. It will this time, but only if engineers do the work of building it.
Most working engineers are in neither camp. They are oscillating, excited by what agents can do this week and uneasy about what shipping under their name has started to mean. This essay is for them.
The pragmatic position is the one engineers have always taken across tool transitions. The outcome is the invariant. The tool is the variable. The discipline is what closes the gap between them, and it has to be rebuilt every time the tool changes by the people who care about the outcome.
This essay is about what that discipline looks like for one specific corner of the engineering job. The pass over code before it ships. The corner has a name now. Self-review.
What authorship bundled
For the entire history of the profession, the person who wrote the code was the person responsible for it. Authorship was the mechanism by which responsibility attached. You typed it, so you read it. You read it, so you thought about it. You thought about it, so you would defend it. Authorship was a single act that produced both the artifact and the accountability, and the profession built its review processes on top of that act, assuming it would always come bundled.
Agents unbundle it. The agent produces the artifact. You produce the accountability, or you don't, and your name goes on something you have not read.
I am writing this from the position of someone who has done exactly that. More than once. In the last year, engineers on my team have brought my code back to me, code that I had shipped under my name, and shown me what was wrong with it. They were right. I had not read carefully enough before I pushed the PR.
I want to say two things about that, because both matter for what this essay is about.
The first is that I am proud of the engineers who brought it back. That is not what happens in most organizations. In many organizations the CTO ships something embarrassing and the engineers route around it, because the political cost of telling the CTO he is wrong is higher than the technical cost of working around him. My engineers told me. By name. Knowing my name was on the work. That is the culture I want to be running. I am proud of the team, and the fact that the embarrassing moment exists at all is evidence that the culture is functioning. The day they stop bringing it back is the day I start to worry.
The second is that the slip mattered to me in a way that is worth being explicit about. I came up in teams where review was a thing you took seriously, where shipping something that needed a second pass was a thing you remembered for years afterward. That standard is the one I measure myself against. It is also the standard the rest of this essay assumes, and it is the standard that registered the slip as a wake-up call rather than a routine correction. The mechanism I want to describe is what happens when an engineer with that standard, in an organization with that culture, still ends up shipping work he had not read. That is not a story about negligence. It is a story about the gradient.
The gradient is this. The agent made shipping cheap. Reading stayed expensive. An engineer's day has a fixed number of hours. The discipline that used to be automatic, because typing the code was itself an act of reading, became a discipline that required deliberate effort and time I had not budgeted for. The bend in my behavior was small, week over week. It became visible only in aggregate. By the end of a quarter, I had shipped things I would not have shipped a year earlier and was telling myself the agent had handled them. The agent had not handled them. I had stopped reading, and I had not built the tool that would have made reading sustainable, and the work was going out the door under my name in a state I would have caught immediately if I had been typing it.
The slip is not unique to me. The mechanism is structural, and any engineer who does not actively defend against it will hit the same gradient. The economics moved. The obligations did not. Your name is still on the commit. Your name is still on the postmortem. Your name is still on the production system that wakes someone up at three in the morning. The agent cannot be paged. You can.
The part of authorship that mattered was never the keystrokes. It was the reading. It was the thinking. It was the willingness to defend what shipped under your name. That part of the job did not move when the typing moved. It still has to be done, by the person whose name is on the work, before the work goes anywhere.
It needs a name, because unnamed work does not get tooled, untooled work does not get scheduled, and unscheduled work does not get done. Call it self-review.
What self-review demands
Self-review is not PR review done to yourself. It is not a glance at the diff before you push. It is not the vibe check that has quietly replaced review at the shops claiming 7-10x productivity. It is the deliberate pass over agent-generated code by the person whose name will be on it, with the specific goal of taking the accountability the agent cannot.
The pass has to do several things at once.
It has to be careful. The self-reviewer reads as the author would have read if the author had typed it. Other lines of defense exist, PR reviewers and CI and linters and downstream engineers, and they will catch what they catch. None of them discharge the accountability that goes with the name on the commit. That accountability is the reviewer's, and the only way to take it on honestly is to have read the code carefully enough to defend it.
It has to be structured. Agent-driven work is stack-shaped. The reviewer needs a way to walk the stack, comment at the right granularity, distinguish line-level concerns from change-level concerns from stack-level concerns, and resume the next day where they left off. Without structure, the pass collapses into scrolling, and scrolling is the failure mode that produces the vibe check.
It has to be re-runnable. The output of the review pass is not approval or rejection. It is comments. The agent reads the comments and edits the code. The reviewer reads the new code and either accepts the edits or comments again. The cycle runs until the reviewer is willing to put their name on what shipped. A review tool that assumes one pass and a binary outcome is solving a different problem.
It has to be local. The reviewer is doing a private pass, not collaborating with anyone. The comments are working notes, not communication. They never need to leave the workstation. They never need to be visible to anyone else. The review is between the engineer and their own conscience, mediated by the code.
And it has to feed PR review, not replace it. After self-review, the code goes through whatever PR process the team uses. The PR reviewer has always been the second pass, never the first. The first pass was authorship, and authorship used to do that pass automatically because typing the code was itself an act of reading it. With agents, the first pass does not happen unless somebody chooses to make it happen, and the somebody is the person whose name is going on the work. Self-review is what the first pass becomes when the typing leaves. PR review goes on doing what it always did.
None of these are demands a tool can satisfy on its own. They are demands on the organization that adopts the tool. The tool has to exist, but the organization has to budget the time, protect the discipline, and treat the pass as part of the work rather than overhead on top of it. The shops claiming 7-10x productivity are mostly not doing this work. The review bottleneck is the hardest problem in the agentic transition, harder than the tooling, harder than the model choice, harder than the cultural shift. I have not seen it solved. I am skeptical of anyone claiming to have solved it on the timeline the throughput numbers imply, because the work it would take to actually solve it is identifiable when you look, and most of the shops citing the numbers are not doing that work. What is more likely is that review has been redefined to mean a glance, or quietly dropped on the bet that consequences arrive slowly enough or diffusely enough that the people making the bet will have moved on by the time anyone connects them.
The mechanism is the same one I described in my own behavior, applied at the level of the organization. The gradient that bent my reading week over week bends entire teams quarter over quarter. Throughput numbers go on slides. Promotion narratives form around the engineers who shipped the most. The work that does not show up on the throughput chart, the careful reading and the deliberate review, gets quietly deprioritized, because nobody is rewarded for it and the cost of doing it is paid by the engineer doing it while the credit for skipping it accrues to everyone. The organizational gradient is the personal gradient with a budget and a quarterly review attached.
The throughput numbers are real. The question the numbers do not answer is what was traded for them.
jjr
I built jjr because I was the problem it solves. The name is short for jujutsu-review. It is a tool for self-review. It is not the answer. It is a worked attempt at the question, offered as evidence the question is buildable.
The first decision was to build on jujutsu rather than git. Done right, agentic development produces stacks of small commits that get reshaped before they ship. Split this one, squash these two, drop that experiment, fix the description. The git workflow for that is interactive rebase, which is slow, error-prone, and easy to bail on. The friction tax that interactive rebase imposes on the cycle is the reason most teams stop running the loop. Jujutsu makes history rewriting first-class. The agent reshapes its own stack when told to. Conflicts get recorded as first-class objects in commits rather than halting the rebase, and descendants reparent automatically. The cycle stays cheap enough to run as many times as the work needs. Adopting jujutsu is a real cost for a git-native team, and I am not minimizing it. The cost is worth paying, because the alternative is paying the rebase tax every cycle and slowly giving up on running the loop, which is the failure mode the rest of this essay is about.
jjr walks a stack of changes oldest to newest. Arrow keys scan a diff. n and p move between changes. Tab cycles files. Enter on a line opens a comment. Comments come in four scopes. Line comments anchor to a specific line in a specific file. Change comments anchor to a whole commit. Description comments anchor to a line in the commit message. Stack comments anchor to the entire review unit. Each comment carries a severity. Required means the agent addresses it by editing. Suggestion means the agent addresses it if safe and consistent with intent. Note means informational, the agent does not act.
There is no channel where the agent declines a comment with an explanation. The agent does not argue. It edits or it does not. The diff is the response. If the reviewer flagged something and the next cycle's diff does not address it, that is the conversation. The reviewer reads the diff and adjudicates. The agent can also produce an edit that looks like it addressed a comment but did not. That is a real failure mode, and the defense against it is the same defense the rest of self-review depends on: the reviewer reads the diff carefully enough to tell. This is a deliberate choice. A channel where the agent can argue with the reviewer is a channel that lets the agent shift accountability back to the agent, and the entire premise of self-review is that accountability does not shift. The reviewer's name is on the work. The reviewer makes the call.
When the reviewer is done with a pass, they hand the comments to Claude. The agent reads the prompt, opens the changes, edits the code in place. The stack reshapes. jjr resumes automatically when Claude exits. Comments that re-anchor cleanly show inline. Comments that cannot re-anchor surface as stale, with the old and new text shown for adjudication. The reviewer either accepts the edits, edits stale comments to re-anchor, or comments again. The cycle runs until the stack is something the reviewer is willing to put their name on.
The whole tool is local. Comments live in .jj-review/ at the repo root, ignored by git and jj. They never get committed and never get shared. The review is private. The artifact that goes to PR review is the code, not the conversation that produced it.
jjr is available now for engineers who recognize this gradient in their own work. It needs a Jujutsu repository. It installs through Nix, Cargo, or Homebrew.
nix profile install github:ericbmerritt/jujutsu-reviewOr:
cargo install jjrOr:
brew install ericbmerritt/jjr/jjrFrom inside any jj-tracked repository:
jjrThat opens the stack. Walk it. Comment on what needs to change. Press C to send the comments to Claude. The agent edits. jjr resumes automatically when Claude exits. Repeat until the stack is something you would defend. Then jj git push.
Closing
I have been doing this for thirty years. The tools have changed several times. The outcome has not. I shipped work this year I would not have shipped a decade ago, because the discipline that used to be automatic stopped being automatic and I had not yet rebuilt it. jjr is what I built to rebuild it. The discipline is mine to keep. The tool just makes keeping it cheaper than not keeping it, which is the bar every previous tool transition had to clear, and the bar this one has to clear too.
The point is not that every team needs jjr. The point is that every team using agents needs something that makes accountability cheaper than evasion. The agent makes shipping cheap. Reading stays expensive. An organization that does not close that gap will stop reading, not all at once but on a gradient slow enough that the throughput chart will not register the change and steady enough that by the time the consequences arrive, the people who made the bet will be elsewhere. That is not productivity. It is deferred risk.