8 min read

The Factory Floor

My median approval-to-merge is 9 minutes. Open-to-first-review is 16 hours. The machine produces in hours; the human queue costs a day. Software factories failed for sixty years because there was nothing to instrument. Agents changed the physics. Andon, takt, and pull, running on a real line.
The Factory Floor
Photo by Lalit Kumar / Unsplash

Back in January, I wrote a piece called The Cockpit Has to Change Too. It ended honestly: I couldn't tell you whether the cockpit was the destination or a waypoint. It was a waypoint. Five months later, the editor on my screen has been demoted, because production is no longer the scarce act, and my desk looks like a factory floor.

Look at the desk

I used to stare at an editor all day because I was the machine. The code came out of my head and through my hands, so the workspace was built around them.

Now my screens show three things, and the names mean nothing outside our walls, so they're worth a sentence each.

The stack is a tool we wrote that manages my lines of development. Right now, there are twelve running at once. The number varies. Each line produces work as stacks of commits, all on jj, flowing downstream toward review. In factory terms, it's not one machine, it's a bank of them, and I'm the one operator walking the bank. That arrangement has a pedigree: Sakichi Toyoda's automatic loom stopped itself when a thread broke, which is what let one worker tend dozens of looms instead of one. The stack is how one engineer tends twelve lines instead of one.

TRM is what's coming into the organization and what I'm helping move. It's where review queues, blocked work, and stalled stacks live, the work that needs human judgment before it can flow. If the stack is my outbound side, TRM is the line's inbound side, and it's where the andon view lives, which we'll get to.

Project telemetry is the gauges. Throughput, flow time, batch size, review signal, for me and for every engineer in the organization. How the line is running, in numbers.

The editor is still there, one station among several. A bank of machines, a production board, gauges on the wall. A plant manager from 1985 would recognize my workspace before he'd recognize my job title.

The theory, in two minutes

The body of thought we're drawing on comes from two places, and you can carry both in your pocket.

The first is the Toyota Production System, built by Taiichi Ohno at Toyota after the war. Its core move is pull instead of push. A station makes only what the next station can consume, signaled by kanban cards. Work moves in small batches. Problems stop the line rather than allowing it to flow downstream. Ohno named seven wastes and called overproduction the deadliest, because making things faster than they can be consumed buries every other problem under a pile of inventory.

The second is Eliyahu Goldratt's Theory of Constraints, laid out in The Goal in 1984. Every system has one constraint, and the throughput of the whole system is the throughput of that constraint. Improving anything else feels like progress and changes nothing. The discipline is to find the constraint, subordinate everything else to it, and pace the entire line to it.

Pull controls overproduction. Constraints determine throughput. Agentic coding breaks both unless the review system becomes the line.

Why Lean kept failing in software

This isn't even the first attempt. Japan ran literal software factories starting in the late 1960s. Hitachi, Toshiba, NEC, and Fujitsu built them, Michael Cusumano wrote the definitive book on them in 1991, and Microsoft revived the idea as its Software Factories initiative in the mid-2000s. All of it fizzled. The broader industry has been borrowing Toyota's vocabulary for 20 years on top of that. Kanban boards, value streams, the whole lexicon. Most of it never stuck, and I think the reason is simple. There was nothing to instrument. The production step was a person's thinking. You can't put an andon cord on cognition. So the boards were fiction, updated by hand, lagging reality, ignored. The factories were metaphors enforced by management. We had the words and none of the physics.

The idea wasn't wrong. It was sixty years early.

Agentic development changed the physics. The production step is now mechanical. Not mechanical because it's deterministic, it isn't. Mechanical because it emits observable units of work at a rate that the rest of the system must absorb. Humans supervise the line rather than work the station, and the work became legible: PRs, pushes, review cycles, and pickup times all fall out of the tooling as exhaust. Nobody has to maintain the gauges by lying to Jira. The factory metaphor didn't suddenly become apt. The factory finally became observable.

The bottleneck didn't move, so we stopped pretending it would

Our problem has been the PR bottleneck for a long time. We haven't solved it. Agents made it worse because production capacity exploded, and review capacity didn't. An agent generating code faster than a human can review it isn't velocity. It's overproduction, Ohno's deadliest waste, piling up as inventory between stations.

I can't take credit for what came next. Jon Edwards, a principal engineer on our team, spent years building software for manufacturing, and he's the one who started pulling these ideas across the gap. He built the original andon screen. He taught me the word takt. My contribution was recognizing the line he was drawing and leaning the organization into it.

Manufacturing has a word for the right pace: takt. Takt time is the drumbeat of the line, the rate at which the customer actually pulls finished product. Produce slower than takt and you starve demand. Produce faster, and you bury yourself in inventory. In our shop, takt is the rate at which the organization can absorb reviewed, merged PRs. Anything produced faster than that isn't speed. It's pile.

So we started experimenting with work limits on outflow. If you hit your limit, you're not allowed to produce more code. Not discouraged. Not allowed.

Most software teams put WIP limits on a board, and everyone ignores them, because nothing physically stops you from starting more work. Ours have teeth, and the teeth are what drove us to andon.

If you haven't lived in the manufacturing world, andon is Toyota's signal system. A cord runs above the line. Any worker who spots a defect pulls it. The line stops, a light comes on, and people swarm the problem instead of letting bad work flow downstream. The genius is a culture where stopping the line is cheap, expected, and everyone's job.

Ours is a view in TRM. The andon view is the queue of everything currently stopping the line: reviews waiting for pickup, stacks blocked on a decision, checks that failed and need a human. So if you hit your work limit and can't produce, what can you do? You open the andon view and work the constraint. You review. You unblock. You stop the line and fix the line.

Notice that both pocket ideas are running here. The outflow limit is pull with teeth. The andon redirect is subordination to the constraint, enforced by tooling instead of willpower.

We have a name for where this is heading. The limits, the andon view, the gauges, the standard work that defines each task: together they're our software manufacturing system.

And I want to be honest about tense. None of this is working yet, not fully. The outflow limits are an experiment, weeks old. The andon habit is forming, not formed. The gauges are real, but we're still learning to read them. What I can tell you is that the direction keeps checking out. Every piece we've put in place has pointed the same way, and nothing we've tried from this body of thought has bounced off. That's not victory. It's a heading.

This was unthinkable in the human-typing era because idle engineering capacity was the most expensive thing in the building. Now production capacity is cheap, and judgment capacity is scarce. The economics finally let us do what Toyota was telling us all along.

What the gauges say

Every engineer can see their own metrics and everyone else's. Here's what mine show, and here's the part that matters. My median time from approval to merge is 9 minutes. Commit to merge is about 5 hours. Open to first review is 16 hours and 40 minutes.

Read that decomposition again. The machine produces in hours. The human queue costs a day. Pickup time is two-thirds of my headline cycle time. That's the entire thesis of this transformation in three numbers. The bottleneck isn't production, and it never will be again. It's the queue in front of human judgment.

What the gauges admit we haven't fixed

The dashboard is also honest about what's still broken, and I'd rather tell you that than claim victory.

My zero-comment approval rate is 38.7%. Some of that is genuinely clean, stacked work. Some of it is probably rubber-stamping under load. I don't know the split yet, and I want to.

My p90 batch size is over 4,000 lines per PR. In lean terms, that's a giant batch hitting a constrained station, and I'd bet money it correlates with the slow pickups. Big batches are exactly what one-piece flow exists to kill. That's the lean ideal of moving work through the line in the smallest possible units, one piece at a time, instead of in batches that clog every station they touch. We're not there.

One reviewer carried 102 reviews in the window, and our top three reviewers handle over a quarter of all reviews. Toyota has a word for this, too: heijunka, the practice of leveling load so no station gets buried while others sit idle. We don't have it. The load isn't balanced across the inspection capacity, and an unleveled line whips and stalls.

These aren't footnotes. They're the next year of work.

The gauges belong to the line

A word on transparency, because someone will read "everyone sees everyone's numbers" and hear surveillance. The lean answer is that the line worker always sees the line's numbers. The data belongs to the people doing the work, not to management as a scorecard. We made one design choice deliberately: operating rate is measured against your own 90-day baseline, faster or slower than yourself, not ranked against your peers. The question the dashboard asks is whether your station is healthy, not whether you're beating the engineer next to you.

What doesn't transfer

Not everything from the factory survives the trip. Demand for features can't be leveled the way demand for widgets can. Heijunka works inside the plant, but product priorities lurch, and no amount of leveling fixes a roadmap that changes because a competitor shipped something on Tuesday. The variability lives upstream of the line, in the market, and the line has to absorb it rather than smooth it.

That's the boundary, and it's worth drawing precisely. The production layer manufactures. The demand layer doesn't. The software manufacturing system names the layer where the physics hold, and it stops at the plant door. Anyone selling you a factory for the demand side is selling a framework.

The craft moved

Which brings me to the objection I can hear already: this kills the craft. It doesn't. The craft moved.

In January my reference point was the trading floor. Stimuli, reaction, decision latency, screens everywhere. That was the right model for the wild west phase, when the job was reacting fast to whatever the agents threw at you. The factory is what comes after you impose discipline on the chaos. Trading floors optimize reaction. Factories optimize flow. I don't want engineers heroically reacting faster. I want takt, work limits, andon, and leveled review load, so that reaction speed stops being the thing that determines throughput.

The human work now is line design, standard work, and judgment at the inspection points. That's where the skill lives. The cockpit post asked how one human keeps up with compounding agent output. The factory answers: you don't keep up. You design the line so you don't have to.

Is the software manufacturing system the destination or another waypoint? Same honest answer as last time. I don't know. But the gauges are real now, and they'll tell us.