Teaching Your LLM the Codebase: CLAUDE.md and AGENTS.md

The previous post introduced the idea: teach the LLM your conventions through a file it reads on every task. This post shows the files themselves.

The root CLAUDE.md

This is the file at the root of the Greenbox repository. It’s the first thing the LLM reads when it starts working:

<!-- file: CLAUDE.md -->
# Greenbox

Produce-box subscription service. Go monorepo.

## Build & Test

- `go test ./...` to run all tests
- `go vet ./...` before committing
- `golangci-lint run` for full lint check

## Project Structure

- `cmd/greenbox/` — Main application entry point
- `subscription/` — Subscription lifecycle (create, pause, resume, cancel, box size)
- `billing/` — Invoices, payment confirmation, pricing
- `delivery/` — Delivery scheduling, packing, dispatch
- `db/` — Database access and migrations

## Conventions

- Guard clauses for early returns. No deep nesting.
- Custom types for IDs and dates: `SubscriptionID`, `CustomerID`, `DeliveryDate`, not raw strings.
- Unexported struct fields. Constructor functions enforce invariants.
- Error wrapping: `fmt.Errorf("doing thing: %w", err)`
- Table-driven tests with `t.Run` subtests.
- Test names describe behaviour: `TestPausedSubscription_CannotChangeBoxSize`

## Domain Language

- "subscription" not "order"
- "box" not "product" or "package"
- "delivery day" not "shipping date"
- "subscriber" not "user" or "customer" (except in CustomerID, which is the billing reference)
- "pause" not "suspend" or "hold"

## Do Not

- No `interface{}` or `any`. Use concrete types or narrow interfaces.
- No `utils`, `helpers`, or `common` packages.
- No global state or package-level variables (except constants).

Thirty lines. Everything a developer – or an LLM – needs to write code that fits the project. The conventions section is the most valuable: it prevents the style drift that Tom and Priya discovered when their LLM-generated code looked like it came from different teams.

Why each section matters

Build & Test seems obvious, but LLMs use it. When asked to verify a change, the LLM runs go test ./... because the file told it to. Without this section, it might run go build and call it done, or guess at a test command that doesn’t exist.

Project Structure tells the LLM where to put new code. When asked to add a delivery feature, it goes to delivery/, not a new top-level package. The structure section is a map.

Conventions is the style guide. Guard clauses, typed IDs, table-driven tests – these are the patterns the team agreed on. Without this section, the LLM generates valid Go that doesn’t match the team’s Go. With it, generated code passes review faster because it already looks like the codebase.

Domain Language is subtle but powerful. Before this section existed, the LLM would generate variable names like orderID, productName, shippingDate. Each one required a review comment: “We call this a subscription, not an order.” Now the LLM uses the right words the first time. This also helps new developers absorb the team’s vocabulary – they see it in the generated code before they’ve read every file.

Do Not is the anti-pattern list. This prevents the LLM’s most common bad habits. Without it, Go LLMs love to create utils packages, use interface{} for flexibility, and introduce package-level variables. The explicit prohibition stops these before they start.

Package-level CLAUDE.md

The root file covers the whole project. Package-level files add context for specific packages:

<!-- file: subscription/CLAUDE.md -->
# Subscription

Manages subscription lifecycle.

## Status Transitions

- Pending → Active → Paused → Active (resume) or Cancelled
- Paused subscriptions cannot change box size.
- Cancelled subscriptions cannot be modified at all.
- `NewSubscription` starts in `StatusPending`.

## Conventions

- All mutations go through methods on `Subscription`. No direct field access from outside.
- Status is a typed constant (`StatusPending`, `StatusActive`, etc.), not a raw string.

And for the billing package:

<!-- file: billing/CLAUDE.md -->
# Billing

Invoices, payment confirmation, pricing.

## Money

- All amounts stored in cents (int64), not dollars (float64).
- Display formatting happens at the HTTP layer, not in billing logic.
- Currency is always AUD. No multi-currency support yet.

When the LLM works in the billing package, it reads both the root CLAUDE.md and the package-level one. The root provides general conventions. The package file provides package-specific rules. The LLM stores amounts in cents because the file says so – no more pull request comments asking “should this be cents or dollars?”

AGENTS.md: specialised roles

Where CLAUDE.md is the general brief, AGENTS.md defines specialised roles – agents the LLM can adopt for specific tasks. The Greenbox team defines two:

# file: AGENTS.md

[[agents]]
name = "test-writer"
description = "Writes tests for Greenbox code following team conventions"

[agents.instructions]
text = """
You write tests for the Greenbox codebase.

Conventions:
- Use table-driven tests with t.Run subtests for any function with more than two cases.
- Test names describe behaviour, not implementation: TestPausedSubscription_CannotChangeBoxSize
- Use precise language in test names:
  - "Cannot" = hard constraint, test failure means a bug
  - "Returns" = pure output check
- Create test fixtures using constructor functions, not struct literals with exported fields.
- Prefer assertion messages that explain the business rule: "paused subscriptions cannot change box size"
- Do not use testify or other assertion libraries. Use stdlib testing only.
- Test through public methods. Never access unexported fields.
"""

[[agents]]
name = "reviewer"
description = "Reviews code for convention drift"

[agents.instructions]
text = """
You review pull requests for the Greenbox codebase.

Check for:
1. Exported fields that should be unexported. Structs should have unexported fields with constructors.
2. Raw strings where typed IDs should be used: SubscriptionID, CustomerID, BoxSize.
3. Deep nesting: more than two levels of if/else suggests missing guard clauses.
4. Missing error handling or unwrapped errors.
5. Tests that test implementation instead of behaviour.

Do not nitpick formatting or style — the linter handles that.
"""

Each agent encodes expertise the team has built up. The test writer knows about precise naming because the team keeps finding that vague test names make failures harder to diagnose. The reviewer catches the convention drift that slips through when everyone’s moving fast – exported fields, raw strings where typed IDs belong, nested conditionals that should be guard clauses.

How agents are invoked

When Priya asks the LLM to write tests, she invokes the test-writer agent:

> /test-writer Write tests for the new pause subscription handler

# The agent reads:
# 1. Root CLAUDE.md (general conventions)
# 2. subscription/CLAUDE.md (package-specific rules)
# 3. The test-writer agent instructions from AGENTS.md
# 4. The relevant source files

The generated tests use table-driven structure, descriptive names, and stdlib assertions – because the agent’s instructions specify all of that. Without the agent, the LLM would still generate tests (it read the root CLAUDE.md), but the agent adds the thoroughness.

Tom uses the reviewer agent during code review:

> /reviewer Review this PR for billing/invoices.go

# The agent checks:
# - Invoice struct has unexported fields
# - Amounts stored in cents, not dollars
# - Typed IDs used instead of raw strings
# - No deep nesting

The reviewer catches an exported Amount field that should be unexported with a constructor, and a string parameter where SubscriptionID should be used. Tom would have caught these too – eventually. The agent catches them in seconds, every time, without fatigue.

The maintenance cycle

Priya warns the team early: “A stale CLAUDE.md is worse than no CLAUDE.md. If the file says ‘use guard clauses’ but the codebase has moved to a different pattern, the LLM generates code that doesn’t match anything.”

The team adopts a rule: when you change a convention, update the CLAUDE.md in the same commit. It’s like updating tests when you change behaviour – the documentation and the code move together.

# Tom's commit message when they adopt a new error type
git log --oneline -1
# a1b2c3d Add DomainError type, update CLAUDE.md conventions

The CLAUDE.md diff in that commit:

 ## Conventions

 - Error wrapping: `fmt.Errorf("doing thing: %w", err)`
+- Domain errors: use `DomainError{Code, Message}` for business rule violations.
+  Reserve `fmt.Errorf` for infrastructure errors (database, network).

Two lines. The LLM now generates DomainError for business rule violations and fmt.Errorf for infrastructure errors. The convention is encoded the moment it’s decided.

Before and after

The clearest proof is in the generated code. Here’s what the LLM generates for “add a Resume method to Subscription” – first without CLAUDE.md, then with it.

Without CLAUDE.md:

// file: subscription/subscription.go
func (s *Subscription) Resume() {
    s.Status = "active"
    s.PauseReason = ""
}

Exported fields. String status. No error handling. No guard clause.

With CLAUDE.md:

// file: subscription/subscription.go
func (s *Subscription) Resume() error {
    if s.status != StatusPaused {
        return fmt.Errorf("cannot resume subscription in status %v", s.status)
    }
    s.status = StatusActive
    s.pauseReason = ""
    s.updatedAt = time.Now()
    return nil
}

Guard clause. Unexported fields. Status constant. Error returned. The code matches the codebase because the LLM read the brief.

The difference isn’t intelligence – it’s context. The LLM is equally capable in both cases. The CLAUDE.md gives it the context to be capable in the right direction.

The compound effect

The team notices something over the following months. The CLAUDE.md doesn’t just make LLM-generated code better. It makes the whole codebase more consistent, because:

New developers read it as an onboarding doc.
The LLM follows it, so generated code demonstrates the conventions.
Code reviewers reference it when explaining why a pattern should change.
The conventions themselves get sharper, because writing them down forces the team to resolve ambiguity. “Use typed IDs” is vague. “Use SubscriptionID not string for subscription identifiers” is precise.

Tom puts it simply: “We wrote a page of conventions for the LLM and accidentally standardised the whole team.”

Lee’s version: “The best documentation is documentation that has a reader. The LLM reads the CLAUDE.md on every task. That makes it the most-read document in the repository.”