More than writers of code: False Velocity and the Pride Test

I shared this piece with the engineering team at Ceros recently, and wanted to re-share it here.

Most of the team have stopped writing code by hand. We are prompting Claude Code or Cursor for the heavy lifting, and to a large extent the act of writing code has been replaced. What is surprising is that we are not seeing the massive momentum gains you might expect from that shift.

For prototyping it is fantastic, spinning something up to validate a direction has never been faster. But deciding what to build, and then getting a change production ready (does the code hold up to review? does it actually work well? is it ready for customers?) has become a bigger blocker than ever. PR review agents help, but they cannot tell us whether the architecture of a solution is right, especially in a product as nuanced as a design tool.

The team reacted really well to it. It sparked some great discussions, and everyone seemed to appreciate the signal that the quality of the code we are shipping still matters more than the velocity at which we generate it.

The article below is what I shared with the team. Reposting it here in full. I expect this to become one of a series of posts as we evolve and react to building software in 2026.

We have seen a massive shift in how we work lately. Most of the team is switching from writing code manually to getting Claude or Cursor to do the heavy lifting. We are adopting agentic workflows, which involve much more than just generating lines of code, but we are seeing a growing trend where PR-author validation is being overlooked.

If the job was now only about being a reviewer, it would be a gloomy prospect. But we have always been architects and product thinkers: those roles are simply becoming a much larger part of our daily lives. The actual writing of code takes up a smaller percentage of our time, but that does not mean we have time to spare. The other parts of the job have not gone away. In fact, the sudden influx of more code creates a significant amount of noise.

Left to its own devices, an LLM will happily patch things until you are left with a disorganised mess. Our job is to find the signal within that noise. We must ensure the architecture of the product and our technical solutions remain sound. This is the craft of software engineering: building clean, maintainable systems that other developers want to work on, and shipping great products that our customers genuinely want to use.

This shift only works if we take individual responsibility for the output we produce. Self-validation is not an optional step. For now, the quality of that output still matters immensely. We need to ensure that both humans and LLMs have clear insight into how our systems work. Test coverage is more important than ever to validate that changes work and do not cause regressions. Documentation and spec files (e.g. openspec) dramatically improve the LLMs chances of writing something we can keep.

Simon Willison wrote an excellent piece on this recently, titled “Your job is to deliver code you have proven to work”. He argues that as engineers, it is still our responsibility to prove the code works and is clean before we ask for a review. Think about open source. The value of a library is not just that someone wrote it for you: the value is that it is proven code. It has been tested over time by many people, or by someone with domain knowledge. That is the standard we need to maintain in our own PRs.

Beyond self-review, we must also ask: “Should I be doing this now?” If I shove a PR into the review queue for something new but not on the roadmap, I am just overloading my teammates’ context windows and distracting them from the work we have already committed to. Our current model will not scale if we treat reviews as a dumping ground for LLM output. It needs to stay 1:1. Everyone must pull their weight on both the production and review of code.

It is easy to fall into the trap of churning out code and pushing it immediately. I have done it myself. I once merged a PR with a Dependabot setting that Claude completely made up. It looked valid, so I did not check it. While it is not always the case, the frequency of these occurrences is increasing.

We must avoid the trap of False Velocity. It is easy to mistake code churn for progress, but merging a clever-looking LLM patch that you have not fully deconstructed is just technical debt in disguise. We cannot measure our value by how quickly we prompt, but by the integrity of what we ship.

This is not a new bar, but it is one we must constantly remind ourselves of: The Pride Test. Before you move a ticket to Ready for Review, ask yourself: “If this bypassed review and merged straight to main right now, would I sleep soundly?” If the answer is not a confident “yes,” the work is not done. Use the checklist below to bridge that gap from AI output to proven engineering.

I would love to hear your feedback on this. Let me know if you have been frustrated by this trend lately, or if you have ideas on how we can better prevent these issues as a team.

Takeaway: The Author’s PR Checklist

Manual Validation: Run the code locally or in a review environment to test edge cases, such as large experiences or complex setting combinations, rather than a fast pass.

Architectural Fit: Ensure the solution follows existing patterns rather than being a quick LLM “patch” job.

Quality Standards: Keep code clean and bundle sizes small. Do not accept increasing complexity just because an LLM wrote it: the LLM will not know what to do with the mess next time either.

Clear Intent: Make it obvious if this is production-ready or simply a prototype for a direction check.

The Pride Test: Ask yourself if you are confident enough to merge this to main in its current state.