I published that SDD didn't work. Then I had to make it work.
From criticizing Spec-Driven Development to building an Agentic Governance Framework from scratch. What works, what doesn't, and what almost nobody is saying about the developer's role when AI is writing the code.
A few weeks ago, I published an article questioning Spec-Driven Development. I raised uncomfortable points about bottlenecks, agent non-determinism, erosion of internal team capabilities, and the biased incentives of the companies selling these tools. The article sparked a lot of discussion. Then came the challenge that, in hindsight, I absolutely had coming.
“That thing you said doesn’t work? We want to implement it. And you’re responsible for making it work.”
Beautiful. Nothing like being served your own medicine at room temperature. My job is to make things happen, so I took it as both a professional and personal challenge. I went deep into research (papers, not ten-minute demos with spotless repos that have never seen production). What I found changed my perspective quite a bit, not because I was completely wrong, but because I was criticizing something I had never truly implemented.
This article is the result of that experience.
The diagnosis was right. The patient was different.
In my previous article, I argued that SDD shifts the bottleneck to whoever writes the specs. You needed your most expensive people producing highly detailed documents before AI could do anything useful. At that point, I said, it would be more efficient for those same people to write the code directly.
That argument has a core flaw: it assumes developers write specs manually.
That is not how it works when the system is implemented properly. The problems I described in my previous article are vibe coding problems, which basically means telling an agent “implement this feature” and hoping the result is consistent with the rest of the system. Those are not the problems of SDD with real governance. I was criticizing the consequences of missing structure, not the consequences of the methodology itself.
The difference between vibe coding and properly implemented SDD is not about tools. It is not about using openspec or spec-kit and similar projects. It is architectural. And that distinction is still missing in most writing on the topic.
What the Agentic Governance Framework is (and why SDD alone is not enough)
The term “Agentic Governance” already exists in technical literature, but usually in a different sense: most current frameworks treat it as a security and compliance problem, focused on controlling what agents can access or execute in production. That matters, but it is not what I mean here.
What I built during this research is more specific: an Agentic Governance Framework (AGF) for software development workflow. It is not just SDD. SDD describes how to write specs so AI can generate code. AGF defines who makes each decision, when they make it, with what information, and how that decision is recorded before a single line of code is written. SDD is one part of AGF, not the other way around.
The core premise is simple: humans govern, AI develops. The developer is no longer primarily the person building the system, but the person governing the agent that builds the system. That role shift is not cosmetic. It has concrete implications for how work is organized, what skills are needed, and how accountability is handled when something goes wrong.
The workflow, step by step
What follows is the process I managed to make work. I am describing it at the level of detail I wish I had found when I started researching.
Step 1: Foundation
Before a spec, before an architecture decision, before discussing technologies, there is a document that answers three questions: what we are building, why we are building it, and what outcomes we expect. I call it foundation.md, and it lives in the repository, not in a slide nobody will ever open again.
The process is not me sitting down and writing that document from scratch. The agent drives a conversation with me, asking whatever is needed until it has enough context to draft it. I provide the initial idea (“I want to build a CLI tool to manage microservice templates”). The agent asks follow-up questions, goes deeper, challenges whether what I am saying actually makes sense. When the conversation ends, the agent writes the document. I review it and approve it.
That foundation is the anchor for everything that comes next. Every later decision must align with it.
Step 2: Architecture Decision Records (ADRs)
ADRs are not new. They are documents that capture architecture decisions in a project: language, framework, database, design patterns, and (most importantly) why that choice beat the alternatives.
What AGF adds is a conversational flow to generate them. Instead of an architect writing an ADR in isolation, the process happens in three stages: first an exploration session where the agent asks questions and may challenge choices that conflict with the foundation, then draft generation that I review and iterate on, and finally formal registration of the approved ADR in the repository.
The folder structure starts to look like this:
arch/
├── adrs/
│ ├── ADR-000-language-and-runtime.md
│ ├── ADR-001-layered-architecture.md
│ ├── ADR-002-dependency-injection.md
│ └── README.md
└── drafts/
The README.md works as an index. The agent is explicitly instructed to consult that index before making any implementation decision.
Step 3: Agent Generated Decision Records (AGDRs)
This is the mechanism that surprised me the most during experimentation, and it solves one of vibe coding’s most serious problems.
During implementation, the agent will inevitably run into situations not covered by ADRs. For example, the project uses Go and has an ADR defining conventions for building a REST API client, but a specific edge case appears that the ADR does not cover. In vibe coding, the agent just makes an arbitrary call, writes code, and moves on. That decision gets buried in the code with no documentation, and the next time the agent faces a similar situation it may choose differently and create inconsistency.
With AGF, the agent is instructed to do something else: it generates a document (the AGDR) explaining what it wants to do and why, then pauses. Implementation does not continue until a human makes one of three calls: approve (the agent can proceed), reject (the agent must rethink), or promote (the decision is important enough to become a formal ADR that governs future work across the project).
arch/
├── adrs/ (human-approved decisions)
├── agdrs/ (agent-generated decisions pending review)
└── drafts/
This creates an identifiable human owner for every meaningful project decision. When something breaks in production, the decision trail exists, is readable, and has a name attached to it. That completely changes accountability dynamics inside the team.
Step 4: Specifications (this is where classic SDD begins)
Only at this point does SDD proper begin. For each feature or change, the workflow goes through three stages:
Exploration: the agent switches to analysis mode. It asks questions about the intended change, examines it from multiple angles, and can flag conflicts with existing architecture decisions before anyone writes a line of code.
Proposal: the agent generates three documents. spec.md describes what will be built and expected behavior. design.md describes exactly how it will be built: which files will change, which new files will be created, what data structures will be defined, including code snippets for the most complex parts. tasks.md breaks everything into mechanical, sequential steps.
Implementation: in a completely new session (with no residual context from previous conversations), the agent reads the approved spec and executes. The design has already been reviewed and approved by a human. The agent no longer needs creative decisions in this phase, only faithful execution.
specs/
├── README.md
└── archive/
├── cmd-new/
│ ├── spec.md
│ ├── design.md
│ └── tasks.md
└── jwt-auth/
├── spec.md
├── design.md
└── tasks.md
When implementation is complete and verified, the spec gets archived. Every completed feature ends up with a full documentary trail.
Why this solves the code review problem I raised before
In my earlier article, I argued that reviewing someone else’s code is harder than writing it, and that this would become the new bottleneck. I still think that is true for vibe coding.
With AGF, I am not reviewing code. I am reviewing design.md before a single line is written. That document tells me exactly which files will change, what new structures will appear, and the reasoning behind each decision. If the design aligns with ADRs and makes sense for the problem, the resulting code will be correct. If it does not, I reject it and we refine the design, which is a text document, not a 300-line diff spread across five files.
Correction costs drop dramatically because mistakes are caught at the cheapest stage of the process.
The repository is not only code. It becomes institutional knowledge.
This point appears in almost no SDD discussion, and I believe it is one of the strongest arguments for adopting this framework.
In Data Governance, the core principles are traceability, ownership, and auditability: being able to track each data point back to its origin, know who made what decision and when, and audit it at any point in time. That is exactly what AGF produces for architecture and development decisions.
When a new developer joins, they can read ADRs and understand in an afternoon why the system is built the way it is, without needing the project’s oral history. When production fails, the decision trail exists and is readable. When the team evaluates whether a technology choice makes sense, it can consult the history of similar decisions in previous projects.
And there is more that I am only beginning to explore: all that structured markdown data is highly indexable. A well-written authentication spec from one project is reusable in the next one. A team’s ADRs can be shared as a library of approved patterns across the organization. You can build a semantic index over the complete decision history and have an agent answer why the system works the way it does, without manual digging. The repository stops being only where code lives and becomes the team’s institutional memory.
The honest trade-offs
I am not here to sell anything, so let us be direct about what this framework is not.
It is not faster. What an experienced developer can solve in five minutes by coding directly may take thirty in this workflow. For small bugs or trivial changes, AGF can feel heavy. And there is no true middle ground: for the model to work and produce a consistent decision history, there cannot be manual coding outside the process. Everything must go through exploration, design generation, and spec iterations. The value of the framework grows with system complexity and project lifespan. For a disposable script, it is overkill.
It consumes resources significantly. Exploration, design generation, and spec iteration all burn tokens. I hit my daily usage cap every day. That is a real operational cost that needs to be budgeted before adopting this approach. I am already exploring ways to optimize consumption, but for now, that is reality.
Dependency risk is real. After several weeks in this workflow, going back to the classic write-code cycle feels heavy. The productivity contrast is strong enough that teams may not want to go back. That is a risk you need to manage consciously, especially if your organization may need to operate with different tools later.
What the framework does provide: an auditable decision history, documentation that grows naturally with each iteration, and code quality that improves over time because the agent accumulates project context through ADRs. Early specs produce correct code. By the tenth iteration, specs produce correct code with highly precise comments aligned with project conventions. That does not happen in vibe coding; honestly, it rarely happens with fully human teams unless there is very strong mentoring and training.
The real value of AGF is not faster deliveries. Compared with methods promising immediate speed, this framework is slower at first, by design.
What it does promise is higher value for the same effort over the same period. Week-one code is correct. Week-ten code is correct and progressively better because the agent has richer project context from ADRs. Every documented architecture decision makes later specs more precise, more coherent, and less iteration-heavy.
Normally, projects slow down as they age: technical debt accumulates, old decisions create friction with new ones, and onboarding gets more expensive. With AGF, the opposite can happen. The project gets more stable because decisions are explicit and documented. Friction drops. New developers (AI or human) understand the system in hours, not months. Productivity keeps rising even after a year of development.
Initial speed is not the same thing as sustainable productivity. This framework is built for the second.
What this changes in the developer role (and what it does not)
This is the hardest part to write because I do not have a fully finished answer yet. But I think it is the most important part of the article.
The cognitive work is still the same: research, domain understanding, decision-making, accountability for outcomes. But time distribution changes radically. Implementation, which used to consume most of the time, now happens without direct human intervention. What remains in developer hands is process governance: deciding what gets built, under what constraints, with which quality criteria, and approving every significant decision before execution.
That requires a different profile from the classic developer archetype. You do not govern an AI agent with intuition built from years of typing syntax. You govern it with architectural judgment, trade-off analysis, and enough understanding of model behavior to design instructions that produce predictable, consistent outcomes.
For CTOs and technical leaders evaluating AI adoption in their development process, the most important question is not whether AI can generate high-quality code from strong specifications (it can). The question is whether your team has people with the judgment to govern that process, not just write prompts, but make architecture decisions with real consequences, sign off on them, and own the outcome when something fails.
If the answer is yes, AGF makes sense. If the answer is “let’s see how the agent does on its own,” what you will actually get is vibe coding with decorative documentation. And there are already enough consultancies charging digital-transformation rates for that.
That is all I have for now. I will publish the prompts, commands, and full framework structure in a public repository when it is ready. But the methodology is here, and anyone who wants to start experimenting with these principles now has enough to begin.