The Cost of Building Fast

AI coding assistants are remarkable, and I want to acknowledge that before I say everything else.

The barrier to entry is lower than it has ever been. A person with a well-articulated idea and access to the right models can ship something real—something functional—in a fraction of the time it would have taken three years ago. I work in infrastructure. I am not romantic about inefficiency. I understand what it means to compress the iteration cycle.

For engineers who have a solid grasp of system design, who understand why decisions get made the way they do, who have debugged enough production incidents to know what "this will become a problem" looks like—agentic workflows are extraordinary. The cost of building well has come down for those people. Meaningfully so. They can move faster without sacrificing the judgment that makes speed safe.

But there is a prerequisite buried in that sentence, and it is not a small one.

You cannot build well with agentic workflows if you don't understand the system you're building. The tools amplify what you bring to them. For an engineer with years of accumulated intuition—about failure modes, about what makes a codebase maintainable, about the compounding consequences of architectural shortcuts—those tools are a genuine multiplier. For someone who hasn't developed that foundation, they are a way to generate a lot of code very quickly. Which is a different thing entirely.

And from the outside, the two outputs look almost identical.

This is where the confusion lives. Not in the tools, which are genuinely impressive. In who we've handed them to, and what we've decided that makes them qualified to do.

But what exactly does it mean for something to be "built well"?

Martin Kleppmann opens Designing Data-Intensive Applications by naming three properties that any serious system must have: reliability, scalability, and maintainability. They sound like buzzwords. They are not.

Reliability means the system continues to work correctly even when things go wrong—when hardware fails, when users do unexpected things, when the software itself has bugs. A reliable system is not one that never encounters faults. It is one that has been designed to tolerate them.

Scalability means the system can handle growth. Not just more users, but more data, more load, more complexity—and handle it without requiring a complete rearchitecture every time the numbers go up. A system that works at ten thousand users and breaks at a hundred thousand was not scalable. It was just untested.

Maintainability is the most underappreciated of the three, and the one most directly under threat right now. Kleppmann breaks it into three things: operability—making it easy for teams to keep the system running smoothly; simplicity—managing complexity so that engineers who didn't write the code can understand it; and evolvability—making it possible to change the system as requirements change. A maintainable codebase is one that a new engineer can open and navigate. One that a team can modify without triggering cascading breakage. One that accumulates understanding rather than debt.

These three properties are not aspirational. They are the baseline for software that people depend on.

The current moment is failing all three. And it is failing them in sequence, in exactly the order you'd expect when you prioritize velocity above everything else.

When the barrier to shipping drops, the population of people shipping expands. Some of them are excellent engineers who are now faster. Many of them are not engineers at all, or are engineers who are bypassing the parts of the process that were, it turns out, doing a lot of load-bearing work: thoughtful design, careful review, the slow and unglamorous discipline of writing code that someone else can actually read and maintain.

Andrej Karpathy, who coined the term "vibe coding" in early 2025, described the practice as "fully giving in to the vibes, embracing exponentials, and forgetting that the code even exists." That instinct makes sense for someone who can reconstruct the code on demand, because the underlying structure is fully internalized. It is a different proposition for someone who never understood it in the first place.

The code does not forget that it exists. It runs in production. And when it breaks—which it will—someone has to understand it.

A December 2025 analysis¹ found that AI co-authored code contained roughly 1.7 times more major issues than human-written code. Logic errors, misconfigurations, security vulnerabilities—all elevated. Code duplication increased² approximately four times in volume between 2021 and 2024, and refactoring dropped from 25% of changed lines to under 10%.

We are generating more code than ever and caring for it less than we ever have.

The codebases being shipped right now are, in many cases, bloated in a very specific way. They are dense with generated logic that no human authored, that no human has a full model of, and that no human will be able to navigate without starting over. They violate Kleppmann's maintainability properties at the level of first principles. They are not simple. They are not operable. They are not evolvable. They accumulated. And they are accumulating very fast.

And to make matters worse, we have introduced a new and truly inspired metric: the token count.

Meta built an internal leaderboard called "Claudeonomics"³ that ranks engineers across 85,000 employees by the volume of AI tokens they consume—awarding titles like "Token Legend" and "Session Immortal."

Gergely Orosz has called the aspiration to expend more and more tokens "tokenmaxxing,"⁴ comparing it to the long-discredited metric of counting lines of code. The comparison is apt. Lines of code were also easy to measure, easy to game, and perfectly orthogonal to the thing you actually cared about. Token count has the same problem, with the added wrinkle that engineers with the largest token budgets produced the most pull requests, but achieved two times the throughput at ten times the cost of tokens.⁴ The ratio is not moving in anyone's favor.

What this metric actually incentivizes is activity. Movement. We are incentivizing the appearance of productivity without the substance of it. A significant share of engineers say⁵ their work is increasingly focused on satisfying investors and not falling behind the competition, rather than solving actual problems for users. Tokenmaxxing is that impulse crystallized into a dashboard. You are not being asked to build something good. You are being asked to generate a lot of something.

And so they do.

The pressure to ship—to demonstrate velocity, to stay competitive, to prove AI adoption is paying off—has created a specific and recognizable kind of engineer in 2026. Not a better one.

Many engineers have been switched over to AI teams to support fast-paced rollouts without adequate time to train or learn, even when they are new to the technology.⁵ Junior developers empowered by AI may generate more code—but also introduce more bugs. More code overall means teams risk creating new technical debt faster than they can pay it down.⁶

The debt is not hypothetical. It is accumulating right now, in codebases that will have to be maintained by engineers who didn't write them, in languages they didn't choose, organized by logic they can only partially reconstruct. The technical debt burden globally stands at $1.31 trillion.⁷ That figure predates the vibe coding era.

68% of tech workers reported experiencing burnout symptoms in 2024, up from 49% just three years prior.⁸ Burnout, in this context, is not just a human cost. It is a systems cost. The people who understand the legacy codebase, who remember why a decision was made, who can navigate the oldest and most load-bearing parts of the infrastructure without triggering a cascade—those people are burning out and leaving. What gets hired in their place is someone faster, cheaper, and less equipped to understand what they've inherited.

I'd like to pause here and turn to talk about what's happening to the infrastructure itself.

GitHub is, in some sense, the physical layer of software development. It is where code lives. It is where pull requests get reviewed and merged, where release pipelines trigger, where the version history of an entire industry is stored. It is supposed to be boring. It is supposed to just work.

But in April, GitHub's merge queue experienced a regression⁹ that affected 2,092 pull requests across 230 repositories, resulting in incorrect merge commits—with some default branches left in states that could not be repaired automatically. Four days later, the Elasticsearch subsystem collapsed under load, causing search-backed features across pull requests and issues to return empty. In the prior month alone, GitHub's reliability dropped to one nine¹⁰—99% uptime, meaning roughly seven hours of downtime in thirty days.

For context: the industry standard is four nines. 99.99% uptime. That translates to about fifty-two minutes of downtime per year. One nine is not a nines problem. It is a different category of problem.

Pull requests disappeared on GitHub for many (all?) users.

This is just the latest outage on a platform where reliability has been beyond unacceptable the last few months.

A fair question: at what point would customers move? How much pain is too much? And where do they move? https://t.co/Ib2b6k2nB5
— Gergely Orosz (@GergelyOrosz) April 27, 2026

GitHub's own CTO cited "agentic development workflows" as the accelerant¹¹—repository creation, pull request activity, and API usage all surging sharply since late 2025, requiring a redesign for 30x current infrastructure scale. The growth is real. The infrastructure wasn't ready for it. And the features kept shipping.

The merge queue breaking is not a minor incident. The merge queue is the thing that ensures code lands in the right order, that changes don't collide, that what you merge is what you think you merged. When it broke, it attacked the very core of Git's trust model¹²—the promise that merged code stays merged. That is not a degraded experience. That is a betrayed contract.

GitHub is not alone. The Claude API has logged 99.01% uptime over the past ninety days.¹³ That is also, to be precise, two nines. On April 28, 2026, Claude suffered a major outage that knocked out access to Claude.ai, Claude Code, and the Anthropic API simultaneously—more than 12,000 users filed reports at peak, and the incident lasted approximately 78 minutes.¹⁴ It was not an isolated event. There had been a separate major disruption eight days earlier, and others before that.

The five-nines gold standard of digital reliability is cracking¹⁵—and the platforms cracking are not legacy systems that neglected to modernize. They are the companies at the frontier of the thing that was supposed to make everything faster and better.

This is worth sitting with for a moment. The tools that are generating all the code are themselves unreliable. The infrastructure that is supposed to hold it is straining under the load that the tools created.

The floor is shaking and we are building upward at speed.

Here's what I worry about.

We are at a moment where the standard is visibly declining—and where the decline is, for now, still surprising people. Gergely's newsletter still reads like an alarm. Engineers are still posting about GitHub outages in tones of genuine disbelief. There is still a shared understanding that this is not okay.

I worry that this window will close.

Human beings are extraordinarily good at adaptation. We normalize the things we encounter repeatedly. A generation of users who grew up with four nines of uptime knows that one nine is a degradation. A generation of users who grows up with one nine will call it normal.

This is, at its core, what Cory Doctorow means by enshittification. The process: first platforms are good to their users; then they degrade those offerings to better serve business customers; finally, they abuse those customers to claw back value for themselves. The specific shape changes—Doctorow was writing about product decisions and monetization—but the structure is the same. You make the experience incrementally worse. People adjust. Then you make it worse again. The adjusted baseline becomes the new floor.

I think we are watching a version of this happen to reliability itself.

The SLO—service level objective—is the internal contract an engineering team makes with itself about how available a service will be. It is not glamorous. It is not a feature. It is a number on a dashboard that says: we have decided that this matters, and this is how much. The SLA—service level agreement—is the external version of that commitment, the one you make to your customers.

What happens when the internal standard quietly shifts? When the error budget gets widened because everyone is moving so fast that the old budget can't be maintained? When SLOs get renegotiated not because the product has changed but because the team can't hit the old ones anymore?

Nothing dramatic. The number changes on a document somewhere. The dashboard updates. The incidents get slightly less urgent, because the threshold for what counts as an incident has moved. And the users adjust, because the users always adjust, because they don't have a choice, because where else would they go.

This is not a prediction. It is already happening. GitHub stopped updating its own status page¹⁰ for a period because the availability was too poor to report honestly. A third-party status page was built to fill the gap.

We are learning to live with it.

There is a thing that happens when you hand powerful tools to people who haven't yet developed the judgment to wield them—and then measure those people by how much they produce.

You get a lot of production.

The scaffolding for assessing quality collapses under the volume, and eventually quality stops being a category you evaluate at all—it becomes a vague aspiration, something to address in a future sprint, something to fix when there's time. There is never time. There is only the next thing to ship, the next token to burn, the next PR to push to a codebase that no one fully understands and that everyone is moving too fast to stop and read.

Not everyone should be engineering.

That is not an elitist position—it is an honest one.

There are things that require expertise, and software that runs in production is one of them. The tools have democratized the ability to generate code have not democratized the ability to reason about systems, to anticipate failure, to understand what you're deploying and what it will do when the conditions change. That understanding—the understanding that makes reliability and scalability and maintainability possible, not as aspirations but as outcomes—still takes time to build. It cannot be prompted into existence.

The cost of building has come down. The cost of building well has too—for the people who already knew how.

And we are going to be living in the thing we built long after we've forgotten how fast we built it.

The Cost of Building Fast

Footnotes