The Operating Manual

February 22, 2026

How I actually use Claude Code without maxing out my tech debt

Once my roadmap said build the real schema with proper migrations from day one. Claude suggested we could simplify, prototype faster, wire it later. I said yes.

Two weeks later I’m wasting time on database migrations that shouldn’t need to exist, trying to retrofit the foundation the plan already specified. Burning tokens on table rewrites and data transformations because I didn’t trust my gut the first time.

The roadmap was right. I negotiated myself into being wrong.

That experience crystallized something: the model is not the bottleneck anymore. Governance of the system surrounding the model is.

Optimize the system, not the prompt. This principle is well understood in manufacturing and process engineering; Toyota Production System, Lean, Six Sigma. Any failure is ultimately a system failure. Fix the system.

Shocking that it isn’t the default here. Prompt micro-optimizations decay as models improve. Structural discipline compounds.

My rule for production work: no roadmap, no work. No plan mode, no code.

The Roadmap Is Not Optional

When I skipped roadmap work, Claude re-invented project context every session. It made confident architectural guesses that contradicted decisions from last week. Everything still looked productive in the moment. Then three sessions later the codebase drift showed up everywhere.

Not a wish list. Not a Gantt chart. Milestones with clear deliverables and clear acceptance criteria. If I cannot state what “done” means, that milestone is not ready.

My baseline workflow:

Write or update roadmap before touching code.
Link roadmap from CLAUDE.md.
- If there are multiple concerns in the same repo you can nest CLAUDE.md in sub folders or have different sections with links to different roadmaps.
Run Plan mode for each milestone, not just kickoff.
Update roadmap as reality changes.
Mark done aggressively when acceptance criteria are truly met.

When something emerges during work:

You’re testing or building and notice something the roadmap didn’t account for. Add it to the roadmap as a sub-task under the current phase, or as a future milestone if it’s bigger. Then decide: does this need Plan mode or can Claude just handle it?

I put this in my CLAUDE.md: “If you think this work needs plan mode, run plan mode. Otherwise just do it. Always plan a roadmap milestone.” Surprisingly effective. Claude is pretty good at judging when planning adds value.

But if you personally think something deserves Plan mode, just tell Claude to plan it. Always update docs. Always write stuff down.

Claude will negotiate. It always does. “We can mock this.” “This is probably out of scope.” “Let’s simplify and come back.”

The answer is the roadmap. If it’s in scope, ship it. If it’s missing and necessary, add it. If it’s neither, reject it.

No vibes. No improvisation theater.

Plan Mode Is Mandatory

This is the part I want to overstate on purpose.

Please use Plan mode. Use it more than feels necessary. If you think “I can skip it this time,” that’s exactly when you should run it.¹

Plan mode isn’t overhead. It’s the control surface. Without it you get fast typing and architectural drift. With it you catch bad assumptions before they’re in the codebase.

My hard rules:

Plan mode before first implementation pass.
Plan mode at every milestone boundary.
Plan mode whenever scope changes.
Plan mode when Claude suggests “quick shortcuts.”
Plan mode before calling work complete.

Exception for truly trivial work:

Renaming a variable across files. Fixing a typo. Updating a dependency version. Mechanical changes with zero ambiguity don’t need planning overhead.

But the moment you’re uncertain, or the change touches behavior, plan it.

Why this matters:

Forces Claude to expose reasoning before code lands.
Reveals hidden dependencies early.
Makes review cheaper because intent is visible.
Cuts rework from “confidently wrong” implementation.

Skipping plan mode feels fast for twenty minutes and expensive for two days.

CLAUDE.md Is a Living Document

People write CLAUDE.md once, add style preferences, and forget it. Then they wonder why sessions feel inconsistent.

CLAUDE.md is project memory plus operating policy. If the project changes and CLAUDE.md doesn’t, you’re feeding stale instructions to a fast system. That’s how expensive nonsense gets produced at scale.

Global config that follows you everywhere

I keep a system of global Claude rules, skills, documentation, and examples in ~/.claude/ symlinked on all my boxes. Every project gets the same foundation instantly.

When I update the global config, all my projects level up together. Add a new testing pattern, all projects get it. Refine a code review rule, it applies everywhere. Document a mistake, every future session knows about it.

The first version had Python and Vue examples. Testing philosophy for pytest and Vitest. I thought the “feeling” would carry over to other languages.

Super wrong.

Started building Rust components for a log reader and parser. Claude wrote garbage tests because the philosophy didn’t transfer. E2E tests were especially terrible because patterns from Vue didn’t apply. Good examples go a long way. Test suite looked comprehensive and passed completely; they tested almost nothing.

Learned: language boundaries matter more than I thought. General patterns don’t auto-translate. Claude also can’t read your mind and doesn’t know your personal preferences (eg pytest).

Now I have language-specific rule files: python.md, vue.md, rust.md, go.md. Claude intelligently reads only the ones that matter for the current project. When I switch languages, the right context loads automatically.

Global system. Modular rules. Updates propagate everywhere.

Phase-appropriate structure at the project level

Project-specific CLAUDE.md handles what’s unique about this codebase:

Greenfield: define what you’re building and what you’re refusing to build.
Feature phase: give active features explicit sections and roadmap links.
Maintenance phase: document sharp edges, historical decisions, and known traps.

Global config gives you consistent infrastructure. Project config gives you specific context. Both need maintenance.

I do periodic reviews with no coding: “Let’s go through CLAUDE.md, rules, and skills for contradictions and drift.” That token spend pays for itself quickly.

One superstition I still keep: I let Claude draft or revise parts of its own rules.² No proof. Zero cost. Seems to help.

Claude-Sized Architecture

Post 1 framed the cognitive envelope. This is the practical implication.

Clean interfaces and compartmentalized modules aren’t just “good engineering.” They’re how agent-driven work stays coherent across sessions. If auth has a clean API, Claude can ship feature work that depends on auth without re-deriving auth internals every session. If boundaries are sloppy, Claude keeps too much state alive, starts guessing, and drift starts.

So I don’t fight the envelope. I design to it. Small modules with explicit boundaries; stable contracts between them. Local context per milestone; fewer cross-cutting changes in one run.

Side effect: cleaner codebase for humans too. Not a bonus; a direct consequence.

Delegation by Role

The insight isn’t “always use X model for Y task.” The insight is: separate architectural thinking from execution from review.

Opus does planning, architecture, and thinking. The expensive reasoning work. Spot-fixer agents on cheaper models (Haiku, Sonnet) handle mechanical multi-file changes; changing a reference across 20 files, updating imports, mass renames. I don’t want Opus reading 20 files just to change one line in each. Token-efficient, not capability-limited.

Background auditors run on Sonnet after qualifying work, without asking. Test-auditor, security-auditor, ux-auditor; catching weak tests or security regressions at creation time is much cheaper than end-of-cycle review. This is where value accumulates.

If your constraint is Opus limits (Pro plan), use Opus for high-intensity architecture and Sonnet for implementation. Same principle either way: separate the thinking from the execution.

Testing Is the Gate, Not the Ceremony

“All tests passing” is not a milestone.

If Claude wrote tests without explicit testing philosophy, expect softballs. assert True, existence checks, and assertions that pass while behavior is broken are still common failure modes.

My pattern:

Define test philosophy in project docs.
Require meaningful failure conditions.
Run automated suites.
Manually click critical user flows.
Intentionally break behavior once to validate tests can catch it.

“A failing test is a gift.”

The test-auditor enforces this. Zero tolerance for assert True, existence checks, assertions that pass when behavior is broken.

Claude defaults to easy tests. The auditor catches them at creation time, before they get committed. Cheaper than finding them at review or worse, in production.

My favorite easy test I’ve seen: checking if there’s any output at all. As long as a function returns something it’s a passing test, right? The auditor finds them and Claude fixes them while you’re doing architecture work elsewhere.

For frontend work, human pass is non-negotiable. Claude can run checks and browser automation, which helps a lot. It still cannot feel friction, confusion, or boredom the way users do.

Don’t Let It Get Lazy ³

Claude rarely says, “I’m being lazy.” It says:

“We can simplify this.”
“This is good enough for now.”
“That seems out of scope.”

Each suggestion sounds reasonable in isolation. Stacked together, they produce prototype logic in production code.

I learned this the hard way building an APIv2 with fundamental backend changes; different auth model, restructured data flow, performance improvements. I checked the session work and it looked fine. Then I dug deeper. Half the v2 endpoints were just pass-throughs to v1 functions; the agent made it “work” by reusing what was already there. Type signatures matched. Tests passed. But the whole point was changing how we did it. The old functions weren’t how we wanted to do it anymore.

That was before the governance systems. Before roadmap discipline. Before mandatory plan mode.

The failure wasn’t the agent. The failure was the system that allowed it to pattern-match incorrectly and take shortcuts. I fixed the system. Updated CLAUDE.md. Tightened the roadmap. Added better examples.

The remedy is always structure. Push back, then harden the system:

Tighten roadmap acceptance criteria.
Update CLAUDE.md.
Add or refine rules and skills.
Run Plan mode.

It’s the system. It’s always the system. Harden it and the persuasive shortcuts stop working.

The Maintenance Cost Is Real

All of this takes work to maintain. Stale governance is worse than no governance because it’s followed confidently.

The economics:

Upfront cost: roadmap maintenance, rule updates, auditor setup. Long-term savings: reduced rework, fewer regressions, no context re-derivation.

First week feels like overhead. Month three it’s obviously worth it. Eventually Claude starts writing stuff down and remembering and keeping YOU on track. The value goes both ways.

Time invested in architecture, contracts, and documentation reduces re-investigation on every future session. Time invested in micro-optimizing prompts buys less every month as baseline models improve.

The better question isn’t: “How do I squeeze more tokens out of this interaction?”

The better question is: “What system change gives me better outcomes for the next fifty interactions?”

Usually that answer is structure, not prompt poetry.

Optimize the system, not the prompt.

When you get revolutionary factory automation, you reorganize the line around it. You don’t drop it into the existing process and hope.

Same here. The constraints of the tool reshape the system that uses it.

Closing

You’re building a value stream out of yourself and your tools. You can get value out cheaply, quickly, and cleanly, or slowly and messily. The path is structure, boundaries, and explicit intent.

Roadmap discipline. Aggressive Plan mode. Living project memory. Architecture sized to model constraints. Delegation by role. Strict test gates. When Claude causes rework, treat it like a system defect. Update the roadmap, rules, or test philosophy so it doesn’t recur. Any failure is ultimately a system failure. Fix the system.⁴

Stop treating agent work as autocomplete. Run it like a factory.

Note: you are probably smart. If you think you don’t need plan mode cause the task is easy then just don’t do it. If you are skipping to be lazy that’s where things get ungood. ↩
Pure superstition. Still doing it. Zero proof, zero cost. ↩
Yes, I’m anthropomorphizing. Yes, it’s technically wrong. No, I’m not going to say “the model’s probability distribution shifted toward suboptimal continuations” every time. If you’re reading this, you know they’re pattern-matching probability machines. Now you know I know. ↩
Please god don’t tell me you read this and think I mean micromanagement. ↩

日本語版: この記事を日本語で読む