Stephan Schmidt - February 6, 2026
The Team of Two
TL;DR: From Roman ballista crews to WWII machine gun teams to modern sniper pairs, militaries keep landing on two. The same logic applies to software -- one senior, one junior, master and apprentice. Two people get collaboration with almost zero coordination cost, while teams of five spend half their energy on process. The organization scales when juniors grow into seniors and get their own apprentice.
This is a plea for two person teams.
I was watching a video of a historical reconstruction – a person rebuilding a portable Roman Manuballista, a torsion-powered bolt thrower. Heavy thing. Too heavy for one soldier to operate, but two could move it, aim it, load it, fire it, and relocate before the enemy figured out where the bolts were coming from. It occupied a very specific tactical niche: powerful enough to change a skirmish, mobile enough to keep up with the infantry. Perfectly optimized for a crew of two.
That stuck with me, because the pattern kept showing up. Machine gun teams in World War II – two men. Modern sniper teams – two. One shoots, one spots. Across thousands of years and completely different technologies, militaries keep landing on two. Not three, not a squad – just two. And I think the reason is more interesting than it seems.
The logic is surprisingly consistent. One person is mobile but lacks firepower. A single soldier carries a rifle, moves fast, decides fast. But when the problem outgrows what one pair of hands can handle – when you need sustained suppressive fire or precision at distance – the weapon gets heavier, the ammunition gets heavier, and you need a second person. Three or more give you massive capability, but now coordination eats the advantage. There’s something almost mathematical about it: the communication overhead between two people is one channel. Three people is three channels. Four is six. Five is ten.1 Two gives you collaboration with almost zero coordination cost.
Two people maintain the mobility of the individual while delivering the capability of a system. The Roman ballista crew, the MG team, the sniper pair – they all discovered the same division of labor independently. One focuses on the target, deep in tunnel vision. The other watches the bigger picture, managing ammunition, observing where shots land, watching the flanks.2 Tunnel vision and situational awareness, running in parallel.
Batman and Robin, Butch and Sundance, Bud Spencer and Terence Hill, even Pinky and the Brain – the pattern is everywhere once you look for it. But Holmes and Watson is the one that maps most cleanly to software teams. This pattern isn’t limited to the military, and once you see it you can’t unsee it. Think about Holmes and Watson. Most people assume Watson is just the narrator, the less-brilliant sidekick who tags along. But Watson matters more than that. Holmes is brilliant, but Watson asks the obvious questions – and those obvious questions force Holmes to explain his reasoning out loud. Half of Holmes’s breakthroughs happen because Watson made him articulate what he was only intuiting. The pair works not despite the asymmetry but because of it. Different skills, different perspectives, one mission.
Which brings me to software, and I think this is where it gets interesting.
I’ve been writing software for forty-five years and building teams for twenty-five, and the best work I’ve seen has almost always come from a pair. Not pair programming in the strict XP sense3 – two equals trading a keyboard back and forth. I mean something closer to the sniper team. One senior, one junior. Master and apprentice (you know who you are).
Why does this work so well? The senior carries the architecture. They see the design mistakes before they become technical debt, and they know which corners you can cut and which ones will cost you six months later. But here’s what’s less obvious: the senior gets better because of the junior. The junior asks questions constantly – Why are we doing it this way? What happens if this fails? Couldn’t we just…? And those questions, exactly like Watson’s, force the senior to explain and validate decisions they might otherwise make on autopilot. I myself made lazier decisions when I worked alone as a manager. Having someone ask “why?” is annoying and invaluable at the same time.
The junior, meanwhile, learns faster than any onboarding process could teach them. Not from documentation, not from a wiki nobody updates, but from watching how a senior developer actually thinks through problems – what they consider, what they dismiss, what makes them nervous. You can’t write that down. You can only absorb it by being there.
And the side effects are remarkable. Knowledge transfer happens without anyone planning it. The bus factor?4 Solved – not through process, not through mandatory documentation sprints, but because two people worked on it together from the start. When one leaves, the other carries on. The weapon stays operational when one crew member is hit. Code review happens in real time, not forty-eight hours later in a pull request. Context is shared, not reconstructed.
I keep coming back to what makes this different from a team of three or five. A team of five needs standups, needs alignment meetings, needs someone to break work into pieces and assign it, needs pull requests and approval workflows. A team of two just talks. There is no process overhead because there’s nothing to process. You’re either working together or you’re not, and the communication channel is direct and singular. Whenever I see a team of five or six, there are at least two teams in there who want out. You can see it in every meeting – put six people in a room and watch who’s actually engaged. It’s never all six. Two or three are leaning in, the rest are waiting for their part to come up. The sub-teams are already there, you just haven’t acknowledged them yet.
There are cases where this breaks down, of course. Some problems genuinely need more people – distributed systems with many moving parts, or situations where the workload simply exceeds what two humans can handle. But I think those cases are rarer than we assume. We structure organizations into teams of six developers because we think this is what one product manager can handle. Or a team lead. We default to larger teams not because the work requires it, but because we don’t trust two people to handle it. That’s an organizational anxiety, not a technical constraint.
And the asymmetry matters for the organization too. When the junior grows – and in a good pairing, they grow fast – they become the senior. They get their own junior. This is how guilds worked for centuries. Master and apprentice. The apprentice becomes a journeyman, then a master with their own apprentice. Software reinvented this and called it “mentoring programs” – but the programs never work as well as just putting two people on the same problem. The organization scales through apprenticeship, not through trying to hire only senior engineers (good luck with that). You’re building a pipeline. Every senior was once someone’s junior, and they carry forward not just knowledge but a way of working.
From my experience coaching CTOs, the teams that struggle are usually too big or too solo. Solo developers get stuck, develop blind spots, build fragile things that work until they don’t. Large teams spend half their energy on coordination. But pairs – pairs just ship. And a pair of two product engineers with a different skill set transition nicely into the age of AI coding.
The Romans figured this out with bolt throwers two thousand years ago. If the optimal team really is two, then most of what we’ve built around software teams – the standups, the sprint planning, the alignment meetings, the elaborate PR workflows – is overhead we created by making teams too big in the first place.
Fred Brooks made this observation in The Mythical Man-Month (1975). The formula is n(n-1)/2 communication channels for n people. Brooks used it to explain why adding people to a late project makes it later. Fifty years on, most organizations still haven’t internalized the math. ↩︎
Modern military sniper doctrine formalizes this as “shooter and spotter.” The spotter handles wind calculations, range estimation, target selection, and security – arguably the harder job. The shooter’s only task is to execute the shot. Most people get this backwards, assuming the shooter is the senior role. ↩︎
Kent Beck formalized pair programming in Extreme Programming Explained (1999). The XP version assumes two roughly equal programmers sharing a keyboard. What I’m describing is closer to the master-apprentice model – the asymmetry is the point, not a limitation to work around. ↩︎
The “bus factor” – how many people on a team can get hit by a bus before the project stalls. It’s a morbid metric, but every CTO I’ve coached has at least one system with a bus factor of one. The usual fix is documentation. The better fix is a second person who was there from the start. ↩︎
About me: Hey, I'm Stephan, I help CTOs with Coaching, with 40+ years of software development and 25+ years of engineering management experience. I've coached and mentored 80+ CTOs and founders. I've founded 3 startups. 1 nice exit. I help CTOs and engineering leaders grow, scale their teams, gain clarity, lead with confidence and navigate the challenges of fast-growing companies.