The Cockpit Has to Change Too
We're coming up on one month of running AI coding agents in production, and I keep bumping into the same realization: we optimized our entire work environment for humans writing code. Now we need to optimize for humans orchestrating agents. The cockpit has to change.
The trading floor
I used to work in the Board of Trade building in Chicago, building a high-frequency trading platform for an options firm back in the late 2000s. I remember walking onto the floor and seeing traders surrounded by monitors. Six, eight, ten screens per desk.
It looked excessive, but it served a purpose.
Traders respond to stimuli. The market shifts, and they shift with it. Their job isn't to predict, it's to react, quickly and correctly. The more visibility they had into market changes, the faster they could respond.
The same physics
We're rediscovering this now.
When you're orchestrating multiple AI agents, your job becomes reactive. You're monitoring code flooding past. You're watching for things that look wrong. Agents are asking you questions; you're answering them and getting back to monitoring. There's this constant stream of information, and your value is in the quality and speed of your judgment calls.
The inputs are different, but the physics are the same: fast-moving signals, limited human attention, and outcomes that depend on timely judgment.
This is the arbiter role I've written about before: engineers shifting from authors of code to arbiters of agent output. But I want to be more direct about what that means: this isn't "still engineering" in the traditional sense. The work is moving from production to supervision. From construction to control systems. From implementation to risk management.
The closest analogies aren't other programming jobs. They're air traffic control, trading desks, site reliability command centers. Roles where the job is situational awareness and fast, high-stakes judgment.
And I hadn't fully appreciated what that looks like physically. You can't do this work effectively on a laptop screen. You need to see everything the agent is doing.
Some of the team have ultra-wide monitors, but that's scattered. Ultra-wides look great, but they're expensive and a new CapEx line item. Multiple standard monitors are at least as good, cheaper, more flexible, and better for terminal-heavy work. Text clarity matters when you're scanning code all day.
Voice
The input side is changing too. We're finding that talking to agents is faster than typing to them.
My use of Wispr Flow has gone through the roof this month. The LLM provides a translation layer: you speak naturally, quickly, move on to the next thing. You don't have to compose precise written instructions when you're in reactive mode. You just talk.
When you're responding to three agents and monitoring two others, typing becomes a bottleneck. Voice lets you stay in flow. Tools like this have moved from nice-to-have to core infrastructure. That's OpEx rather than CapEx, but it's still a line item nobody budgeted for.
The monitoring problem
The hard part isn't generating code anymore. It's managing the human decision queue that comes with it.
We'll eventually get better tooling for this. We're not there yet.
I need visibility into three things: intent (what I asked for), execution (what the agent is doing right now), and result (what changed in the system). And I'm tracking this across multiple lines of development simultaneously. That's a lot to hold.
Right now, we're solving the PR problem. Pull requests are flooding in from agents working in parallel. But PR management involves human concerns: things you need to respond to, things you're waiting on, decisions that require context. Those need to be visible in the same view, surfaced alongside everything else you're tracking.
We're building tooling internally to address this. Terminal Review Monitor (TRM) is our attempt to make the PR layer as reactive as the rest of the system, part of a broader control plane for agent-assisted development. Decision latency is what matters. The faster you can move from stimulus to judgment to action, the more leverage you have.
Why this matters beyond ergonomics
This isn't about nicer desks.
If agent output compounds but human response doesn't, the system bottlenecks at judgment. And bottlenecks define throughput. Whoever shortens human decision latency will outcompete everyone else.
Cycle time. Deployment velocity. Competitive response. Cockpit design isn't a comfort upgrade—it's about how much leverage one engineer can safely apply to production.
The gap
There's a lot of talk about AI transforming work, but very little about what that transformation actually looks like.
Accenture renamed 800,000 employees "reinventors." We're debugging monitor setups and PR queues.
That gap tells me we're genuinely early. Most organizations haven't gotten far enough to discover that their cockpit has to change. They know they need to adopt agents, they're just not sure how. So it's happening in a willy-nilly fashion, team by team, without a coherent picture of what the work actually becomes.
I'll admit: I don't know if this is permanent or transient.
Right now, I need that visibility across multiple agents and multiple workstreams simultaneously. That demands screen real estate.
But maybe that changes. Maybe tooling emerges that hides the LLM work, lets agents just run, surfaces only what requires human judgment. In that world, the need for a trading floor setup might fade.
I'm not comfortable with that yet. I don't trust what I can't see. But I'm aware that my discomfort might be a transitional artifact. The same instinct that made early pilots distrust instrument flight.
We're in the wild west. The pattern so far is clear: the work has changed, and the cockpit has to change with it. What I can't tell you is whether this cockpit is the destination or just the next waypoint.