Most “I use AI to code” articles fall into two camps: breathless evangelism or dismissive scepticism. I have been using AI to build a full-stack product since early 2023, first through chat interfaces with Claude, Gemini, and ChatGPT, then since mid-2025 with Claude Code working directly in the codebase. The shift from informal AI assistance to governed, in-codebase partnership changed everything. Not because the models got better (though they did), but because I stopped treating AI as a code generator and started treating it as a governance partner.
The Slop Era Is Over. The Governance Question Is Not.
A year ago, the criticism was fair. The 2025 Stack Overflow survey found developer trust in AI outputs at 29 per cent, down from 40 per cent the year before. Two-thirds of developers reported spending extra time fixing “almost-right” generated code. “AI slop” was a legitimate charge.
That was then. In the first half of 2026, the landscape shifted. Claude Opus 4.6 and Sonnet 4.6 score above 80 per cent on SWE-bench, the industry benchmark for real-world software engineering tasks. Claude Code has a 91 per cent satisfaction rating and was voted “most loved” by 46 per cent of developers. Adoption is near-universal: JetBrains reports that 95 per cent of developers now use AI tools at least weekly, with 75 per cent using them for more than half of their coding work. The raw capability gap that drove the 2025 trust crisis has largely closed.
So the slop problem is solved, right? Not quite. Better models did not eliminate the governance problem; they transformed it. When the AI was mediocre, its mistakes were obvious: wrong syntax, broken imports, hallucinated APIs. You caught them in minutes. When the AI is excellent, its mistakes are subtle: architecturally plausible code that contradicts a decision you made three weeks ago, a migration that follows valid patterns but not your patterns, a commit that passes every test but violates conventions nobody wrote down.
Capable tools without constraints do not produce less debt. They produce more sophisticated debt, faster. The question has shifted from “is the AI good enough?” to “who is ensuring the AI follows our engineering decisions?”
CLAUDE.md as Encoded Engineering Culture
Claude Code reads a file called CLAUDE.md at the root of your project. It is a plain text file that defines rules, conventions, and constraints the AI must follow when working in your codebase. Think of it as a combination of a code style guide, an architecture handbook, and an onboarding document, all compressed into a single file that the AI reads on every interaction.
The insight that changed how I work: your best engineering practices already exist in your head. You know the commit message format. You know which database patterns are safe. You know the testing requirements for each layer. The problem is that you forget them when you are tired, rushing, or tempted to skip steps. CLAUDE.md externalises those decisions so the AI enforces them consistently, especially when you would not.
This is not prompt engineering. This is governance. It is the same principle behind Architecture Review Boards, coding standards documents, and engineering runbooks, except it runs on every interaction instead of once a quarter.
What We Actually Encode
Our CLAUDE.md is not a vague set of preferences. It is a specific, testable set of rules that evolved over a year of daily use. Here is what it covers:
Commit message standards. Every commit follows a Problem/Solution/Impact structure with mandatory metrics: lines changed, files touched, work classification, and GitLab issue references. This sounds bureaucratic until you need to trace why a decision was made six months later. The AI enforces the format even at 2 a.m. when I would happily write “fix bug” and move on.
Architecture Decision Records. An ADR is a short document that captures a significant technical decision along with its context and consequences. We have written more than 100 of them. The CLAUDE.md requires that significant decisions reference relevant ADRs and that new patterns get documented before implementation. The AI does not just follow ADRs; it reminds me when I am about to contradict one.
Testing requirements by layer. Database changes require database-level tests. API routes require integration tests with schema validation. Frontend components require unit tests. The AI knows which layer it is working in and applies the correct testing strategy automatically.
Yak shaving time limits. If fixing unrelated issues exceeds 30 minutes, stop and document. This rule exists because an AI assistant will happily chase a tangent for hours if you let it. The 30-minute cap forces a deliberate choice: is this worth the detour, or should we log it as technical debt and return to customer value?
Database schema conventions. Every database migration must follow our chosen schema methodology, which separates core entities from their relationships and historical changes. This architecture gives us full audit history and strict data isolation between customers. The AI generates migrations that comply with these patterns automatically because the rules are explicit, not implied.
Work classification. Every task is tagged with business value markers: MOAT (defensible advantage), SHARE (community-building), or INTERNAL. Revenue impact: PAID, FREE, or NONE. This classification happens at the start of work, not after the fact, which means every commit carries strategic context.
Skills, Subagents, and Hooks as Quality Gates
CLAUDE.md is the foundation, but the system extends beyond a single file. Claude Code supports three extension mechanisms: skills (reusable instruction sets that activate contextually), subagents (autonomous agents that orchestrate multi-step workflows), and hooks (automated checks that run before or after tool calls). We have built specialised skills that activate contextually: a commit message generator that pulls metrics from work tracking files, a database development skill that enforces our schema conventions, an API route scaffolder that generates input validation and customer data isolation automatically.
AI Governance Stack
Beyond skills, we use subagents that orchestrate multi-phase workflows. A database migration is not a single prompt; it is a four-phase workflow: analyse the requirement, design the schema following our conventions, generate security policies that ensure each customer can only access their own data, then produce tests. Each phase depends on the previous one. The subagent maintains state across all four.
Quality gate hooks run automatically during the AI’s lifecycle. If a database migration uses an elevated security mode without justification, the hook flags it. If a release tag does not match our documented naming convention, the hook blocks it. If implementation code is written before test code, the test-driven development hook prompts a course correction.
The shift is fundamental. This is no longer “AI generates code.” This is “AI participates in an engineering process.”
A System That Evolved with the Tool
One detail that most “here is my setup” articles miss: the governance system did not arrive fully formed. It co-evolved with Claude Code itself as Anthropic shipped new capabilities.
In July 2025, we started with a single CLAUDE.md file defining commit standards and GitLab integration. When Claude Code released hooks support, we added automated quality checks. When custom skills shipped in October 2025, we built six specialised skills in a single day: work tracking, commit messages, release management, GitLab operations, wiki integration, and a meta-skill for creating new skills. When subagents became available, we orchestrated multi-phase workflows that a single prompt could not coordinate.
Each Claude Code release created a new governance surface that we could encode decisions into. Hooks let us block bad patterns automatically. Skills let us encode domain-specific conventions (database schema conventions, API scaffolding, blog publishing). Subagent frontmatter let us attach quality gate hooks scoped to specific workflows. Most recently, auto-memory lets the AI itself build up institutional context across sessions.
The result, after nine months: 114 Architecture Decision Records, 47 specialised skills, 5 subagents with quality gate hooks, and 1,946 co-authored commits, all governed by a system that grew incrementally rather than being designed upfront. Each layer built on the last. Each Claude Code release made the next layer possible.
The Surprising Result
The revelation, after a year, is counterintuitive: AI does not replace judgment. It enforces judgment you already have but forget under pressure.
The CLAUDE.md is not telling the AI what to do. It is telling future-you what past-you decided.
— The real purpose of encoded engineering cultureThe CLAUDE.md is not telling the AI what to do. It is telling future-you what past-you decided. Every rule in that file represents a moment where I thought carefully about how something should work, then wrote it down so that neither I nor the AI would have to re-derive the answer under pressure.
The closest analogy from traditional engineering: an Architecture Review Board that runs on every commit, not once a quarter. In my previous role at Elia Group, the European transmission system operator, ARBs met periodically to review significant decisions. The process was valuable but infrequent. With an encoded governance system, every interaction is reviewed against the accumulated decisions of the project.
The Mental Model That Works
In early 2025, the common framing was “a fast junior with zero judgment.” That was accurate then. In 2026, models like Claude Opus 4.6 demonstrate genuine architectural reasoning, multi-file refactoring, and design-level thinking. The “junior” label no longer fits the capability. But the core insight still holds: the AI has encyclopaedic knowledge and no institutional context.
It knows every design pattern in the literature. It does not know which design patterns your team chose and why. It can generate a perfectly valid database migration. It cannot know that your project committed to a specific database methodology in an early architecture decision. The gap is not intelligence; it is context. And context is exactly what governance files provide.
This reframing matters because it addresses the real anxiety engineers feel without dismissing the tool’s capability. The question is not “will AI replace me?” The question is “how do I ensure this highly capable tool follows our accumulated decisions, not just general best practice?” The answer is governance: encode your standards, enforce them automatically, and spend your judgment on the decisions that actually require it.
AI Without vs. With Governance
| Dimension | Without Governance | With Governance |
|---|---|---|
| Code consistency | Varies by prompt and session | Enforced by CLAUDE.md rules |
| Architecture drift | Contradicts prior decisions | References and follows ADRs |
| Testing | Optional, inconsistent | Layer-appropriate, mandatory |
| Commit quality | "fix bug", "update stuff" | Problem/Solution/Impact with metrics |
| Technical debt | Invisible, compounding | Tracked, time-capped, classified |
The engineers who thrive with AI coding tools are not the ones with the cleverest prompts. They are the ones who invest in process infrastructure: rules, conventions, quality gates, and automated enforcement. The prompt is ephemeral. The governance system compounds.
What Changed After 100+ ADRs and 50+ Migrations
After a year, the quantified impact is real but nuanced. Velocity increased by roughly 30 per cent for well-defined tasks with clear governance rules. For exploratory work or novel architectural decisions, the improvement is smaller because those tasks genuinely require human judgment that cannot be encoded in advance.
We can make that claim because we instrumented the process. Every commit carries structured metrics: focused time, work type, domain (database, API, frontend), business value classification, and capability references. This is not decorative metadata. A skill parses the git log into a velocity cache (a measure of how much work the team delivers per unit of time), broken down by work type and domain, and generates three-point estimates (optimistic, likely, and pessimistic) for future tasks based on our actual historical performance. When I say “30 per cent faster,” that is not a feeling. It is a comparison of median focused time per commit before and after governance encoding, drawn from 168 classified commits over a 90-day window.
The principle behind this is the same one that drives DORA metrics, the four key measures that high-performing engineering teams use to track delivery speed and stability. The difference is that DORA metrics are typically collected by CI/CD infrastructure. Ours live in the git log itself, which means every commit is both a code change and a data point. The governance system does not just enforce standards; it generates the evidence to evaluate whether those standards are working.
Consistency improved dramatically. Before governance encoding, commit messages varied wildly. Database migrations sometimes followed our schema conventions and sometimes did not. API routes had inconsistent validation. After encoding, these variations disappeared. Not because the AI is perfect, but because the rules are explicit and the AI follows explicit rules reliably.
What I would do differently if starting over: write the CLAUDE.md on day one. We spent over two years building with AI through chat interfaces before adopting Claude Code with formal governance in July 2025. The ungoverned period was not wasted; it taught us which conventions matter enough to encode. But the debt from inconsistent patterns took weeks to unwind once we had the tools to see it clearly. Every convention you encode early saves compounding confusion later.
The practices that transferred most directly from enterprise architecture: ADR discipline, work classification, and the principle that operational burden is a design cost. These are not startup ideas or enterprise ideas. They are engineering ideas, and they apply regardless of team size.
Why this matters beyond our codebase: every team using AI coding tools will eventually confront the governance question. The teams that answer it deliberately, by encoding their standards and enforcing them automatically, will ship faster and more consistently than the teams that treat AI as an unconstrained code generator.
Key Takeaways
- The 2025 “AI slop” era is over: models like Claude Opus 4.6 score above 80 per cent on SWE-bench, and 95 per cent of developers now use AI tools weekly. But capable tools without governance produce more sophisticated debt, not less.
- CLAUDE.md externalises engineering judgment so the AI enforces decisions you have already made, especially when you are tired or rushing.
- Specialised skills and subagents extend governance beyond a single file into multi-phase workflows with automated quality gates.
- The gap is no longer intelligence; it is institutional context. The AI has encyclopaedic knowledge but no awareness of your team’s specific decisions. Governance files bridge that gap.
- The governance system co-evolved with Claude Code’s capabilities: CLAUDE.md, then hooks, then skills, then subagents, then auto-memory. Each release created a new surface for encoding decisions.
- Write your governance file on day one. Every convention encoded early saves compounding confusion later.
Built with This Process
Numonic is the product this governance system built: an AI-first digital asset management platform with provenance tracking, compliance infrastructure, and full audit history for every asset. Every feature shipped through the process described in this article.
If you have built your own CLAUDE.md or AI governance approach, I would genuinely like to hear about it. The conversation about AI engineering discipline is just getting started.
Get in Touch