The Trap of Vibe Coding: Is There a Better Way to Manage AI-Generated Code?

In the last two years I’ve tried most of the spec-driven frameworks (BMAD, Spec Kit, OpenSpec, Apexyard, and more) and the app generation tools (Bolt, Lovable, Replit). The pain is still the same. Change a requirement, or add an enhancement, and the code complicates, drifts from the main goal, and fills up with redundant and dead paths. It stops being maintainable. With each major development iteration, rebuilding the app became a better choice than improving it.

We've automated the production of legacy code.

Which is funny, because that’s the de facto decision of every developer assigned to legacy code. Most AI usage today treats the model like a junior freelancer. We say “build this,” it builds something, the tests are green, we move on. I’d never run a human hire this way. No onboarding, no spec review, no verification levels. So why do I accept it from a machine?

The cost shows up in three places. First, dead code piles up after every change, because nothing tracks which code serves which requirement. There’s no traceability from a requirement down to the functions that implement it, so deleting code is never provably safe, and nothing gets deleted. Second, nobody holds the full picture of a business feature anymore. I stopped reading the code, and the AI forgets everything the moment the session ends. Third, the context tax. Each new session starts with the same ritual: re-explain the project, let the model re-analyze half the repo, burn tokens rebuilding context the codebase should already carry. The knowledge exists. It just isn’t persisted anywhere a machine can load it.

Spec-driven frameworks solve part of this, and credit where it’s due. They pin down the requirements and some of the development constraints. But they stop at the spec. There’s no concrete linkage between a requirement and the exact part of the code that reflects it. The spec says what should exist. Nothing verifies what does exist, at the level of files and functions. So spec and code drift apart quietly, and green tests can’t tell you.

Here’s the thought I can’t shake. The fix probably isn’t a smarter model. It’s constraints. A scoped, analyzed codebase where every small unit of code points back to the smallest unit of requirement it serves. If that link existed, and a tool could check it, the rot would have nowhere to hide. Dead code becomes computable: code that serves no requirement. Hallucination becomes checkable: a claim with no link to evidence.

From all the side projects I’ve burned weekends on, the goals reduce to three. Persist the knowledge. Cut the token cost. Eliminate outdated code and AI hallucination in any AI-generated project.

Still exploring the tools. Might end up building my own.