Technical Migration: The Strangler Fig in Go

Tom sends the new Melbourne developer the strangler repository, the flag store, and a link to the postmortem. “Start with dual reads. Don’t skip steps.” This is what those templates look like.

The previous post told the story of two migrations: the one that nearly double-charged two thousand subscribers, and the boring one that replaced it. The strangler fig pattern, dual reads, dual writes, flip the primary, retire the old, is a sequence of code patterns. This is what they look like in Go.

The feature flag

The migration is controlled by three feature flags. In Go, a feature flag needs a value that can change at runtime without a redeploy, and that every instance of the service can see:

// file: migration/flags.go
package migration

// FlagStore reads and writes feature flags. The in-memory
// implementation works for tests and single-process deployments.
// Production uses a shared store like Redis so that flipping
// a flag takes effect across every instance.
type FlagStore interface {
	Get(name string) bool
	Set(name string, value bool)
}

The interface is small on purpose. A flag store does two things: read a flag and write a flag. The in-memory implementation is enough for tests:

// file: migration/flags.go
import "sync"

type InMemoryFlagStore struct {
	mu    sync.RWMutex
	flags map[string]bool
}

func NewInMemoryFlagStore() *InMemoryFlagStore {
	return &InMemoryFlagStore{flags: make(map[string]bool)}
}

func (s *InMemoryFlagStore) Get(name string) bool {
	s.mu.RLock()
	defer s.mu.RUnlock()
	return s.flags[name]
}

func (s *InMemoryFlagStore) Set(name string, value bool) {
	s.mu.Lock()
	defer s.mu.Unlock()
	s.flags[name] = value
}

In production, Tom swaps this for a Redis-backed implementation, same interface, shared state. Flipping a flag in Redis takes effect across every instance within the next read. The migration code doesn’t change.

The three migration flags:

// file: migration/flags.go
const (
	FlagDualRead   = "dual_read"
	FlagDualWrite  = "dual_write"
	FlagNewPrimary = "new_primary"
)

type MigrationFlags struct {
	store FlagStore
}

func NewMigrationFlags(store FlagStore) *MigrationFlags {
	return &MigrationFlags{store: store}
}

func (f *MigrationFlags) DualRead() bool   { return f.store.Get(FlagDualRead) }
func (f *MigrationFlags) DualWrite() bool  { return f.store.Get(FlagDualWrite) }
func (f *MigrationFlags) NewPrimary() bool { return f.store.Get(FlagNewPrimary) }

func (f *MigrationFlags) SetDualRead(v bool)   { f.store.Set(FlagDualRead, v) }
func (f *MigrationFlags) SetDualWrite(v bool)  { f.store.Set(FlagDualWrite, v) }
func (f *MigrationFlags) SetNewPrimary(v bool) { f.store.Set(FlagNewPrimary, v) }

All three start disabled. Tom enables them one at a time, days apart. Each flag is independent, you can have dual reads without dual writes, or dual writes without switching the primary. The flags compose a progression, not a single switch.

The strangler repository

The subscription repository from the DDD posts defines the interface the domain needs:

type Repository interface {
	Save(ctx context.Context, sub *Subscription) error
	FindByID(ctx context.Context, id SubscriptionID) (*Subscription, error)
	FindByCustomerID(ctx context.Context, id CustomerID) ([]*Subscription, error)
}

The strangler repository wraps two implementations of this interface, old and new, and compares their results:

// file: migration/strangler.go
package migration

import (
	"context"
	"log/slog"
	"sync"

	"greenbox/subscription"
)

type StranglerRepository struct {
	old   subscription.Repository
	new   subscription.Repository
	flags *MigrationFlags
	log   *slog.Logger
	wg    sync.WaitGroup
}

func NewStranglerRepository(
	old, new subscription.Repository,
	flags *MigrationFlags,
	log *slog.Logger,
) *StranglerRepository {
	return &StranglerRepository{
		old: old, new: new, flags: flags, log: log,
	}
}

// Wait blocks until all background comparisons have finished.
// Production code never calls this. Tests do.
func (r *StranglerRepository) Wait() {
	r.wg.Wait()
}

The StranglerRepository implements subscription.Repository. The subscription service receives it and has no idea it’s talking to a migration wrapper, it just sees a repository. When the migration is done, you swap it for postgres.NewSubscriptionRepository(newConn) and delete the migration code.

Wait() exists for tests. In production, the background comparisons run asynchronously and nobody waits for them. In tests, calling Wait() guarantees the comparison has finished before you check the logs. No sleeps. No timing assumptions.

The FindByID method is where the pattern lives:

// file: migration/strangler.go
func (r *StranglerRepository) FindByID(
	ctx context.Context,
	id subscription.SubscriptionID,
) (*subscription.Subscription, error) {
	if r.flags.NewPrimary() {
		return r.readFromNewWithShadow(ctx, id)
	}
	if r.flags.DualRead() {
		return r.readFromOldWithComparison(ctx, id)
	}
	return r.old.FindByID(ctx, id)
}

Three paths, controlled by flags. When no flags are enabled, the old database is used, the system behaves exactly as it did before the migration started. The fallback is always the previous state.

Every method on subscription.Repository – FindByCustomerID, Save, any future query methods, follows the same structure. Read from the primary. Compare asynchronously with the shadow. Log discrepancies. The pattern is the same; only the method signature changes. This is why the strangler repository is concrete, not generic: it implements the full subscription.Repository interface, including every domain-specific query method. A generic wrapper could only cover the methods it knows about.

Phase 1: Dual reads

When DualRead is enabled, every read hits both databases. The old database is the source of truth. The new database is checked for comparison:

// file: migration/strangler.go
func (r *StranglerRepository) readFromOldWithComparison(
	ctx context.Context,
	id subscription.SubscriptionID,
) (*subscription.Subscription, error) {
	oldSub, err := r.old.FindByID(ctx, id)
	if err != nil {
		return nil, err
	}

	// Compare asynchronously -- don't slow down the request
	r.wg.Add(1)
	go func() {
		defer r.wg.Done()
		r.compareWithNew(context.Background(), id, oldSub)
	}()

	return oldSub, nil
}

func (r *StranglerRepository) compareWithNew(
	ctx context.Context,
	id subscription.SubscriptionID,
	oldSub *subscription.Subscription,
) {
	newSub, err := r.new.FindByID(ctx, id)
	if err != nil {
		r.log.Warn("dual read: new database error",
			"subscription_id", id,
			"error", err,
		)
		return
	}

	diffs := compare(oldSub, newSub)
	if len(diffs) > 0 {
		r.log.Error("dual read: discrepancy detected",
			"subscription_id", id,
			"differences", diffs,
		)
	}
}

The comparison runs in a goroutine. The request returns immediately from the old database. If the new database is slow, or down, or wrong, it doesn’t affect the response. The caller never knows the comparison happened.

This is the key design decision: the dual read never degrades the primary path. The old database serves the request. The new database is checked asynchronously. Discrepancies are logged but never block.

The comparison function checks field by field:

// file: migration/compare.go
package migration

type Discrepancy struct {
	Field    string
	OldValue string
	NewValue string
}

func compare(old, new *subscription.Subscription) []Discrepancy {
	var diffs []Discrepancy

	if old.ID() != new.ID() {
		diffs = append(diffs, Discrepancy{"id", string(old.ID()), string(new.ID())})
	}
	if old.BoxSize() != new.BoxSize() {
		diffs = append(diffs, Discrepancy{"box_size", old.BoxSize().String(), new.BoxSize().String()})
	}
	if old.IsActive() != new.IsActive() {
		diffs = append(diffs, Discrepancy{
			"active",
			fmt.Sprintf("%v", old.IsActive()),
			fmt.Sprintf("%v", new.IsActive()),
		})
	}
	if old.CustomerID() != new.CustomerID() {
		diffs = append(diffs, Discrepancy{
			"customer_id",
			string(old.CustomerID()),
			string(new.CustomerID()),
		})
	}

	return diffs
}

Every field that matters gets compared. When Tom enabled dual reads on Monday morning, 247 discrepancies surfaced on day one, all from the reconciliation job’s direct database query. The comparison function told Lina exactly which subscriptions diverged and which fields were wrong. She found the fallback query in an hour.

Without dual reads, that reconciliation job would have been a landmine. With dual reads, it was a log line.

Phase 2: Dual writes

When DualWrite is enabled, every write goes to both databases:

// file: migration/strangler.go
func (r *StranglerRepository) Save(
	ctx context.Context,
	sub *subscription.Subscription,
) error {
	// Always write to the current primary
	primary, shadow := r.old, r.new
	if r.flags.NewPrimary() {
		primary, shadow = r.new, r.old
	}

	if err := primary.Save(ctx, sub); err != nil {
		return err
	}

	if r.flags.DualWrite() || r.flags.NewPrimary() {
		if err := shadow.Save(ctx, sub); err != nil {
			r.log.Error("dual write: shadow write failed",
				"subscription_id", sub.ID(),
				"error", err,
			)
			// Don't return the error -- the primary succeeded
		}
	}

	return nil
}

The primary write must succeed. The shadow write is best-effort. If the shadow fails, it’s logged but the operation succeeds. This means the shadow might fall behind, and the dual reads will catch the discrepancy, which tells the team there’s a problem to investigate.

Notice the pattern: the primary write is never conditional. The if blocks only control the shadow. If all flags are disabled, Save writes to the old database and returns. No shadow. No comparison. The system behaves as if the migration code doesn’t exist.

This is what Charlotte means by “reversible at every step.” Disabling the DualWrite flag doesn’t roll back data. It stops new writes from going to the shadow. The primary database is never affected by a flag change.

Phase 3: Flip the primary

When NewPrimary is enabled, the roles reverse:

// file: migration/strangler.go
func (r *StranglerRepository) readFromNewWithShadow(
	ctx context.Context,
	id subscription.SubscriptionID,
) (*subscription.Subscription, error) {
	newSub, err := r.new.FindByID(ctx, id)
	if err != nil {
		return nil, err
	}

	// Shadow-read the old database for comparison
	r.wg.Add(1)
	go func() {
		defer r.wg.Done()
		r.compareWithOld(context.Background(), id, newSub)
	}()

	return newSub, nil
}

func (r *StranglerRepository) compareWithOld(
	ctx context.Context,
	id subscription.SubscriptionID,
	newSub *subscription.Subscription,
) {
	oldSub, err := r.old.FindByID(ctx, id)
	if err != nil {
		r.log.Warn("shadow read: old database error",
			"subscription_id", id,
			"error", err,
		)
		return
	}

	diffs := compare(newSub, oldSub)
	if len(diffs) > 0 {
		r.log.Error("shadow read: discrepancy detected",
			"subscription_id", id,
			"differences", diffs,
		)
	}
}

The structure is identical to phase 1, just swapped. The new database serves requests. The old database is the shadow. If discrepancies appear, Tom disables NewPrimary and the system reverts to the old database in the same second. No rollback script. No reconciliation. One flag.

Wiring it up

In main.go, the strangler repository replaces the original:

// file: main.go
package main

func main() {
	// Use a shared flag store so flipping a flag
	// affects every instance of the service.
	flagStore := redis.NewFlagStore(redisClient)
	flags := migration.NewMigrationFlags(flagStore)

	oldDB := postgres.NewSubscriptionRepository(oldConn)
	newDB := postgres.NewSubscriptionRepository(newConn)

	repo := migration.NewStranglerRepository(
		oldDB, newDB, flags, slog.Default(),
	)

	subService := subscription.NewService(repo, eventBus)

	// Expose flag controls via admin API
	adminMux := http.NewServeMux()
	adminMux.HandleFunc("POST /flags/dual-read/enable",
		func(w http.ResponseWriter, r *http.Request) {
			flags.SetDualRead(true)
			w.WriteHeader(http.StatusOK)
		})
	adminMux.HandleFunc("POST /flags/dual-read/disable",
		func(w http.ResponseWriter, r *http.Request) {
			flags.SetDualRead(false)
			w.WriteHeader(http.StatusOK)
		})
	// ... same for dual-write and new-primary
}

The subscription service doesn’t know about the migration. It receives a subscription.Repository and uses it. The strangler repository is infrastructure, it implements the domain’s repository interface with migration behaviour. When the migration is done, you replace migration.NewStranglerRepository(...) with postgres.NewSubscriptionRepository(newConn) and delete the migration code. The domain code never changed.

This is the ACL pattern applied to infrastructure. The migration is an adapter that the domain doesn’t know about.

The discrepancy dashboard

Priya’s dashboard, “two lines tracking each other perfectly”, needs the same treatment as the flags: an interface that can be backed by in-memory counters for tests or a proper metrics system in production:

// file: migration/stats.go
package migration

import "sync/atomic"

// StatsRecorder tracks migration metrics. The in-memory
// implementation works for tests. Production uses Prometheus
// counters, CloudWatch, or whichever metrics backend the
// team already has.
type StatsRecorder interface {
	RecordComparison()
	RecordDiscrepancy()
	RecordShadowError()
}

type InMemoryStats struct {
	Comparisons   atomic.Int64
	Discrepancies atomic.Int64
	ShadowErrors  atomic.Int64
}

func (s *InMemoryStats) RecordComparison()  { s.Comparisons.Add(1) }
func (s *InMemoryStats) RecordDiscrepancy() { s.Discrepancies.Add(1) }
func (s *InMemoryStats) RecordShadowError() { s.ShadowErrors.Add(1) }

With the stats recorder added to the StranglerRepository, the comparison methods track what they find:

// file: migration/strangler.go
func (r *StranglerRepository) compareWithNew(
	ctx context.Context,
	id subscription.SubscriptionID,
	oldSub *subscription.Subscription,
) {
	r.stats.RecordComparison()

	newSub, err := r.new.FindByID(ctx, id)
	if err != nil {
		r.stats.RecordShadowError()
		r.log.Warn("dual read: new database error",
			"subscription_id", id,
			"error", err,
		)
		return
	}

	diffs := compare(oldSub, newSub)
	if len(diffs) > 0 {
		r.stats.RecordDiscrepancy()
		r.log.Error("dual read: discrepancy detected",
			"subscription_id", id,
			"differences", diffs,
		)
	}
}

An HTTP endpoint exposes the in-memory stats for local development; in production, the dashboard reads from whichever metrics backend the StatsRecorder writes to:

// file: main.go
stats := &migration.InMemoryStats{}
adminMux.HandleFunc("GET /migration/stats",
	func(w http.ResponseWriter, r *http.Request) {
		json.NewEncoder(w).Encode(map[string]int64{
			"comparisons":   stats.Comparisons.Load(),
			"discrepancies": stats.Discrepancies.Load(),
			"shadow_errors": stats.ShadowErrors.Load(),
		})
	})

When the discrepancy count is zero for two weeks, the migration is done. The dashboard proves it. Not a gut feeling. Not a spot check. Continuous comparison across every read for fourteen days.

Testing the migration

The tests exercise the flag transitions, the exact sequence Tom runs in production:

// file: migration/strangler_test.go
func TestDualReadDetectsDiscrepancy(t *testing.T) {
	oldRepo := subscription.NewInMemoryRepository()
	newRepo := subscription.NewInMemoryRepository()
	store := migration.NewInMemoryFlagStore()
	flags := migration.NewMigrationFlags(store)
	var buf bytes.Buffer
	log := slog.New(slog.NewTextHandler(&buf, nil))

	repo := migration.NewStranglerRepository(oldRepo, newRepo, flags, log)

	// Save to old only -- simulates pre-migration state
	sub := subscription.NewSubscription("sub-1", "cust-1", subscription.BoxSizeMedium)
	sub.Activate()
	oldRepo.Save(context.Background(), sub)

	// New database has stale data
	staleSub := subscription.NewSubscription("sub-1", "cust-1", subscription.BoxSizeSmall)
	staleSub.Activate()
	newRepo.Save(context.Background(), staleSub)

	// Enable dual reads
	flags.SetDualRead(true)

	// Read returns old (correct) data
	result, err := repo.FindByID(context.Background(), "sub-1")
	if err != nil {
		t.Fatal(err)
	}
	if result.BoxSize() != subscription.BoxSizeMedium {
		t.Errorf("expected medium from primary, got %v", result.BoxSize())
	}

	// Wait for the background comparison to finish
	repo.Wait()

	// Discrepancy should be logged
	if !strings.Contains(buf.String(), "discrepancy") {
		t.Error("expected discrepancy log entry")
	}
}

func TestDualWriteKeepsDatabasesInSync(t *testing.T) {
	oldRepo := subscription.NewInMemoryRepository()
	newRepo := subscription.NewInMemoryRepository()
	store := migration.NewInMemoryFlagStore()
	flags := migration.NewMigrationFlags(store)
	log := slog.New(slog.NewTextHandler(io.Discard, nil))

	repo := migration.NewStranglerRepository(oldRepo, newRepo, flags, log)

	flags.SetDualWrite(true)

	sub := subscription.NewSubscription("sub-1", "cust-1", subscription.BoxSizeLarge)
	sub.Activate()
	err := repo.Save(context.Background(), sub)
	if err != nil {
		t.Fatal(err)
	}

	// Both databases should have the subscription
	oldSub, err := oldRepo.FindByID(context.Background(), "sub-1")
	if err != nil {
		t.Fatalf("old repo: %v", err)
	}
	newSub, err := newRepo.FindByID(context.Background(), "sub-1")
	if err != nil {
		t.Fatalf("new repo: %v", err)
	}

	if oldSub.BoxSize() != newSub.BoxSize() {
		t.Errorf("box sizes diverged: old=%v new=%v",
			oldSub.BoxSize(), newSub.BoxSize())
	}
}

func TestFlagDisabledReturnsToOldBehaviour(t *testing.T) {
	oldRepo := subscription.NewInMemoryRepository()
	newRepo := subscription.NewInMemoryRepository()
	store := migration.NewInMemoryFlagStore()
	flags := migration.NewMigrationFlags(store)
	log := slog.New(slog.NewTextHandler(io.Discard, nil))

	repo := migration.NewStranglerRepository(oldRepo, newRepo, flags, log)

	sub := subscription.NewSubscription("sub-1", "cust-1", subscription.BoxSizeMedium)
	sub.Activate()
	oldRepo.Save(context.Background(), sub)

	// No flags enabled -- should only read from old
	result, err := repo.FindByID(context.Background(), "sub-1")
	if err != nil {
		t.Fatal(err)
	}
	if result.BoxSize() != subscription.BoxSizeMedium {
		t.Errorf("expected medium, got %v", result.BoxSize())
	}

	// Write with no flags -- should only write to old
	sub2 := subscription.NewSubscription("sub-2", "cust-2", subscription.BoxSizeSmall)
	sub2.Activate()
	repo.Save(context.Background(), sub2)

	_, err = newRepo.FindByID(context.Background(), "sub-2")
	if err == nil {
		t.Error("expected sub-2 to not exist in new repo when dual write is disabled")
	}
}

The tests prove the safety properties:

Dual reads detect discrepancies without affecting the response.
Dual writes keep both databases in sync.
Disabling all flags reverts to the old behaviour with no side effects.

Each test describes a scenario from the migration narrative. TestDualReadDetectsDiscrepancy is the 247 discrepancies Tom saw on day one. TestFlagDisabledReturnsToOldBehaviour is the rollback path that Tom’s first migration didn’t have. And none of them use time.Sleep – repo.Wait() waits for the actual work to finish, not for an arbitrary timer to expire.

What Tom sends the Melbourne developer

When the new developer asks “what’s the pattern for adding a new database?”, Tom sends the strangler repository code as a template, the Melbourne developer will write their own version implementing their own domain’s repository interface, with the same structure but different types and methods. The MigrationFlags and FlagStore go across unchanged because they’re genuinely reusable: same three flags, same progression, same shared store. The StatsRecorder interface comes with them, hooked to the same dashboard pattern. The test suite goes as a template, with the three scenarios adapted to the new repository. And the postmortem from the first migration goes too, the 6 AM story that explains why the boring pattern exists in the first place.

The Melbourne migration follows the same pattern with different types. Three weeks. Zero incidents. The code isn’t shared, each domain’s StranglerRepository implements its own repository interface, with its own methods and its own comparison function. What’s shared is the shape: wrap old and new, read from the primary, compare with the shadow, log discrepancies, control everything with flags. That’s the template.

“The migration code is boring,” the Melbourne developer tells Tom.

“That’s the highest compliment it can receive,” Tom replies.