Teaching Your LLM the Codebase: CLAUDE.md and AGENTS.md

April 09, 2026 · 12 min read

The idea is teaching the LLM your conventions through files it reads on every task. Here are Greenbox’s files, line by line, and the incidents that put each line there.

Two versions of the same function

Ask an LLM to “add a Resume method to Subscription” in a repository with no brief, and you get something like this:

// file: subscription/subscription.go
func (s *Subscription) Resume() {
    s.Status = "active"
    s.PauseReason = ""
}

Exported fields. A raw string for the status. No guard against resuming a subscription that was never paused. Nothing wrong, exactly; it just belongs to some other codebase.

Ask the same thing in the Greenbox repository, where the brief exists, and you get:

// file: subscription/subscription.go
func (s *Subscription) Resume() error {
    if s.status != StatusPaused {
        return fmt.Errorf("cannot resume subscription in status %v", s.status)
    }
    s.status = StatusActive
    s.pauseReason = ""
    s.updatedAt = time.Now()
    return nil
}

Guard clause. Unexported fields. Status constant. Error returned. Same model, same prompt. The difference isn’t intelligence, it’s context.

The rest of this post is that context, file by file. The thing to notice as you read: almost every line is there because something went wrong without it. The brief isn’t a style guide written on a quiet afternoon. It’s scar tissue.

The root CLAUDE.md

This is the file at the root of the Greenbox repository, the first thing the LLM reads when it starts working:

<!-- file: CLAUDE.md -->
# Greenbox

Produce-box subscription service. Go monorepo.

## Build & Test

- `go test ./...` to run all tests
- `go vet ./...` before committing
- `golangci-lint run` for full lint check

## Project Structure

- `cmd/greenbox/` — Main application entry point
- `subscription/` — Subscription lifecycle (create, pause, resume, cancel, box size)
- `billing/` — Invoices, payment confirmation, pricing
- `delivery/` — Delivery scheduling, packing, dispatch
- `db/` — Database access and migrations

## Conventions

- Guard clauses for early returns. No deep nesting.
- Custom types for IDs and dates: `SubscriptionID`, `CustomerID`, `DeliveryDate`, not raw strings.
- Unexported struct fields. Constructor functions enforce invariants.
- Error wrapping: `fmt.Errorf("doing thing: %w", err)`
- Table-driven tests with `t.Run` subtests.
- Test names describe behaviour: `TestPausedSubscription_CannotChangeBoxSize`

## Domain Language

- "subscription" not "order"
- "box" not "product" or "package"
- "delivery day" not "shipping date"
- "subscriber" not "user" or "customer" (except in CustomerID, which is the billing reference)
- "pause" not "suspend" or "hold"

## Do Not

- No `interface{}` or `any`. Use concrete types or narrow interfaces.
- No `utils`, `helpers`, or `common` packages.
- No global state or package-level variables (except constants).

Thirty lines. Everything a developer, or an LLM, needs to write code that fits the project.

Every section has a scar

Build & Test looks too obvious to write down. It got written down the day Tom asked the LLM to verify a refactor and it ran go build, declared success, and moved on; the broken test surfaced in review twenty minutes later. Compiling isn’t passing. Now the file says what “verify” means here, and the LLM runs go test ./... because the file told it to.

Project Structure earned its place the week the LLM put new delivery-scheduling logic in a fresh top-level scheduling/ package. Perfectly reasonable, if you didn’t know delivery/ existed. For a fortnight Greenbox had two homes for the same idea. The section is a map, and the map says where new code lives.

Conventions is the section that started everything: the hour-long review where Tom couldn’t tell Priya’s style choices from behaviour choices. Guard clauses, typed IDs, table-driven tests, these are the patterns the team agreed on, and with them written down, generated code passes review faster because it already looks like the codebase.

Domain Language is the scar with Maya’s name on it. Before this section existed, the LLM generated orderID, productName, shippingDate, and every one of them needed a review comment: “We call this a subscription, not an order.” Now the LLM uses the team’s words the first time, and new developers absorb the vocabulary from the generated code before they’ve read a single design note.

Do Not is the anti-pattern list, and every entry is a repeat offence. The utils package appeared twice in one week before the prohibition went in. interface{} kept arriving as “flexibility” nobody asked for. Telling the LLM what not to do turned out to be as valuable as telling it what to do, exactly as it is with a new hire.

Package-level CLAUDE.md

The root file covers the whole project. Package-level files add context for specific packages:

<!-- file: subscription/CLAUDE.md -->
# Subscription

Manages subscription lifecycle.

## Status Transitions

- Pending → Active → Paused → Active (resume) or Cancelled
- Paused subscriptions cannot change box size.
- Cancelled subscriptions cannot be modified at all.
- `NewSubscription` starts in `StatusPending`.

## Conventions

- All mutations go through methods on `Subscription`. No direct field access from outside.
- Status is a typed constant (`StatusPending`, `StatusActive`, etc.), not a raw string.

The status transitions aren’t documentation for documentation’s sake. They’re the state machine the team mapped out around the Example Mapping table, the same one that bit them when the packing list ignored pause state. Writing it where the LLM reads it means no generated code ever has to guess what a paused subscription is allowed to do.

And for the billing package:

<!-- file: billing/CLAUDE.md -->
# Billing

Invoices, payment confirmation, pricing.

## Money

- All amounts stored in cents (int64), not dollars (float64).
- Display formatting happens at the HTTP layer, not in billing logic.
- Currency is always AUD. No multi-currency support yet.

Cents-not-dollars went in after a PR arrived storing a weekly price as a float64 and the team spent an afternoon arguing about rounding. The AUD line is Tom’s conscious shortcut from the payment work, the one with the // SHORTCUT comment, promoted to policy: it stops the LLM from helpfully adding the multi-currency support nobody has decided to build.

When the LLM works in billing, it reads both the root CLAUDE.md and the package-level one. General conventions from the root, package rules from the local file. Amounts come back in cents because the file says so, and the “should this be cents or dollars?” review comment retires.

The agent files

Where CLAUDE.md is the standing brief, roles live in .claude/agents/. Each AgentA system that wraps an LLM with tools, memory, and a loop, so it can take multi-step actions toward a goal rather than just answering one prompt. is a markdown file of its own: YAML frontmatter naming it and saying when it applies, then instructions in plain English. The Greenbox team defines two.

<!-- file: .claude/agents/test-writer.md -->
---
name: test-writer
description: Writes tests for Greenbox code following team conventions. Use when writing or updating tests.
---

You write tests for the Greenbox codebase.

Conventions:
- Use table-driven tests with t.Run subtests for any function with more than two cases.
- Test names describe behaviour, not implementation: TestPausedSubscription_CannotChangeBoxSize
- Use precise language in test names:
  - "Cannot" = hard constraint, test failure means a bug
  - "Returns" = pure output check
- Create test fixtures using constructor functions, not struct literals with exported fields.
- Prefer assertion messages that explain the business rule: "paused subscriptions cannot change box size"
- Do not use testify or other assertion libraries. Use stdlib testing only.
- Test through public methods. Never access unexported fields.

The “Cannot” versus “Returns” distinction is a scar too. A test called TestPause3 failed during the payment work and nobody could tell from the name whether the failure meant a broken business rule or a changed output format. Forty minutes of archaeology bought one naming convention.

<!-- file: .claude/agents/reviewer.md -->
---
name: reviewer
description: Reviews Greenbox code for convention drift. Use when reviewing pull requests or diffs.
---

You review pull requests for the Greenbox codebase.

Check for:
1. Exported fields that should be unexported. Structs should have unexported fields with constructors.
2. Raw strings where typed IDs should be used: SubscriptionID, CustomerID, BoxSize.
3. Deep nesting: more than two levels of if/else suggests missing guard clauses.
4. Missing error handling or unwrapped errors.
5. Tests that test implementation instead of behaviour.

Do not nitpick formatting or style. The linter handles that.

That last line is load-bearing. The first version of the reviewer flagged fourteen formatting nits on one PR, and the useful findings drowned in them. An agent that nitpicks gets ignored; the instruction tells it where its judgement is wanted and where the linter already has the job.

How the agents get used

There’s no ceremony to it. Priya writes:

> Use the test-writer agent to write tests for the box-size change handler.

Claude Code hands the task to the agent, which works with its own instructions plus everything the CLAUDE.md files already establish: the root conventions, the subscription package’s status rules, the source it needs. Often she doesn’t name the agent at all; the description line in the frontmatter is enough for the tool to pick the specialist when the task is a test-writing task.

Tom uses the reviewer the same way during code review. On its first proper outing it catches an exported Amount field that should be unexported with a constructor, and a string parameter where SubscriptionID belongs. Tom would have caught both, eventually. The agent catches them in seconds, every time, without fatigue.

AGENTS.md: the same brief for every other tool

The last file isn’t for Claude Code at all, or not only. AGENTS.md is the vendor-neutral version of the same idea: a plain markdown brief at the root of the repository that most coding agents, whoever makes them, know to look for. CLAUDE.md is what Claude Code reads; AGENTS.md is what nearly everything else reads.

Greenbox’s AGENTS.md is the root CLAUDE.md under another name. Same structure, same conventions, same domain language, same prohibitions. No special format, no schema, no registry of roles: just the brief, in the place other tools expect to find it. Two files saying the same thing is a drift risk, so Priya adds the rule that keeps the pair honest: change one, change both, in the same commit. (Some teams symlink one to the other; same effect.)

Why bother, when the whole team uses the same tool this week? Because “this week” is doing a lot of work in that sentence. The team has just watched what happens when one developer’s LLM has the brief and another’s doesn’t: an hour-long review and code from two different planets. A contractor arriving with their own setup is the same problem on a delay, and the fix costs one file.

The maintenance cycle

Priya warns the team early: “A stale CLAUDE.md is worse than no CLAUDE.md. If the file says ‘use guard clauses’ but the codebase has moved to a different pattern, the LLM generates code that doesn’t match anything.”

The team adopts a rule: when you change a convention, update the CLAUDE.md in the same commit. It’s like updating tests when you change behaviour, the documentation and the code move together.

# Tom's commit message when they adopt a new error type
git log --oneline -1
# a1b2c3d Add DomainError type, update CLAUDE.md conventions

The CLAUDE.md diff in that commit:

 ## Conventions

 - Error wrapping: `fmt.Errorf("doing thing: %w", err)`
+- Domain errors: use `DomainError{Code, Message}` for business rule violations.
+  Reserve `fmt.Errorf` for infrastructure errors (database, network).

Two lines. The LLM now generates DomainError for business rule violations and fmt.Errorf for infrastructure errors. The convention is encoded the moment it’s decided, which is also the moment the scar is freshest.

The compound effect

The team notices something over the following weeks. The CLAUDE.md doesn’t just make LLM-generated code better. It makes the whole codebase more consistent, because:

  1. New contributors read it as an onboarding doc.
  2. The LLM follows it, so generated code demonstrates the conventions.
  3. Code reviewers reference it when explaining why a pattern should change.
  4. The conventions themselves get sharper, because writing them down forces the team to resolve ambiguity. “Use typed IDs” is vague. “Use SubscriptionID not string for subscription identifiers” is precise.

Tom puts it simply: “We wrote a page of conventions for the LLM and accidentally standardised the whole team.”

Lee’s version: “The best documentation is documentation that has a reader. The LLM reads the CLAUDE.md on every task. That makes it the most-read document in the repository.”

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.