Rules, Grammars, and Regex

May 30, 2026 · 21 min read

A regulator emails the compliance team: every customer email mentioning a competitor must be flagged for review within ten minutes of receipt, with an audit trail of why each one was flagged. The team starts designing an LLMLLMA neural network trained to predict the next token in a sequence, large enough that it generalises to tasks it wasn’t explicitly trained for. pipeline. Six weeks in, the regulator wants to see the modelModelA trained set of weights plus the architecture that makes them useful – the thing you load up and run inference against. card and asks why the system flagged a borderline message yesterday at 3:47pm. Nobody can answer.

A list of competitor names in a regex would have shipped on day one and answered the regulator’s question in three seconds.

This post is about the AI that isn’t AI, deterministic, hand-written rules. They have no learning, no embeddings, no probabilistic outputs. They’re often dismissed as “primitive,” and they’re often the correct answer.

In the previous post we covered the classical statistical baselines that beat fancy models on small problems. This post covers the rule-based systems that beat statistical models on problems where you actually know the answer.

The case for rules

A rule-based system is one where every decision is dictated by code a human wrote, not weights a machine learned. Three properties make rules valuable:

They’re deterministic. Same input, same output, every time. No drift, no hallucination, no surprise.
They’re auditable. Every decision can be traced to a specific line of code. You can explain to a regulator, an auditor, or a customer exactly why the system did what it did.
They’re free at inference time. A regex match runs in microseconds. A finite-state transducer runs at gigabytes per second. There’s no per-call cost, no rate limit, no GPU.

In return for those properties, rules are inflexible. They handle exactly what you wrote down, and nothing else. The first time the world produces an input you didn’t anticipate, your rule fails, silently or loudly, depending on the system.

It all comes down to the shape of the input space. When it’s bounded and the rules are knowable, hand-written rules are unbeatable. When it’s open-ended and full of paraphrase and ambiguity, hand-written rules are useless and you need a learning system.

Regular expressions: the workhorse

You know regular expressions. They’re a small language for describing patterns in text. They came out of theoretical computer science in the 1950s and have been quietly running production systems ever since.

Things you can do with regular expressions and shouldn’t reach for an LLM for:

Validating an email address looks plausible.
Extracting phone numbers, postcodes, dates, ABNs, account numbers, IP addresses.
Tokenising structured logs. Every webserver log, syslog entry, and audit trail is parseable by regex.
Recognising fixed product codes in customer support tickets (“KB-2847-FATAL”) for routing.
Stripping HTML, normalising whitespace, redacting sensitive fields.
Implementing a basic spam filter for known bad strings.
Anything where the pattern is exact, even if there’s some variation.

If you can write a regex that matches your target with high precision and recall, you don’t need a model. You’re done. Ship the regex.

The known dangers

Regular expressions have a well-earned reputation for sharp edges, and the relevant ones are:

Catastrophic backtracking. A poorly written regex can take exponential time on adversarial inputs. The re2 library (Google) sidesteps this with a different engine. If you’re processing untrusted input, use re2.
Unicode is harder than ASCII. \w doesn’t always mean what you think it means once you leave ASCII land.
The “more is more” trap. A regex that grows past 200 characters is usually a sign you should be writing a parser, not a pattern.

The discipline that pays off is treating regex as a focused tool. Use it when the pattern is genuinely regular. Reach for something else when it isn’t.

Finite-state transducers: regex with structure

A finite-state transducer (FST) is a regex with two important upgrades: it can produce output, and it can be composed with other FSTs.

An FST is a state machine that consumes input symbols and emits output symbols based on its current state. The classic use is morphological analysis, mapping walked to walk + PAST or mice to mouse + PLURAL. The transducer encodes the rules of a language’s morphology directly.

FSTs are the workhorse of:

Speech recognition lexicons, mapping phoneme sequences to words.
Computational morphology for low-resource languages.
Spell-checkers and stemmers for languages with rich inflection.
Pre-processing pipelines for NLP in production search systems.
Machine translation grammars, particularly rule-based MT for language pairs without enough parallel corpus for neural translation.

The dominant tool here is OpenFST (a C++ library originally from AT&T). The pynini Python wrapper is the practitioner’s way in. For most people building a normal application this is overkill, but if you’re working on production search, speech, or non-English NLP, FSTs are part of the toolkit and they’re not going away.

Context-free grammars: parsing structured language

Beyond regular languages live context-free grammars (CFGs). The tool you reach for when you have a language with structure that a regex can’t capture, nested brackets, recursive expressions, anything where the validity of one part depends on another part you haven’t seen yet.

CFGs are the foundation of programming-language compilers. They’re also production tools for:

Parsing structured user input, query languages, formula syntax, search expressions.
Validating semi-structured documents. LaTeX, JSON, XML, all defined by grammars.
Information extraction from templated text, forms, contracts, regulatory filings.
Implementing natural-language interfaces with bounded vocabularies, voice command systems, where the user can only say a fixed set of patterns.

You write the grammar, you run a parser-generator (ANTLR, Bison, Lark for Python), you get a parser. The result is fast, deterministic, and tells you exactly which production rule matched.

For most application code, CFGs are overkill, regex handles it. But the moment you find yourself writing nested-condition regex with manual depth tracking, stop and reach for a grammar.

Decision trees and rule lists

A decision tree is a sequence of if-then rules organised as a tree. Each internal node tests a feature; each leaf is a decision. They sit on the boundary between rules and ML, you can hand-write a decision tree (it’s just a flowchart) or learn one from data (sklearn.tree.DecisionTreeClassifier).

Hand-written decision trees are the correct answer for:

Eligibility logic. “Customer is eligible for the discount if they’ve been with us for more than 12 months AND have spent more than $500 AND haven’t used a discount in the last 90 days.”
Triage and routing. Support ticket routing, document workflows, customer-service escalation.
Compliance gating. Regulatory rules that must be applied exactly as written.
Game logic. Rules in a turn-based game, transitions in a state machine.

The advantage of a decision tree (or its equivalent: a sequence of if-elif statements in code) is that the entire decision process is visible and reviewable. The disadvantage: it doesn’t generalise. If a new condition shows up that wasn’t in the rules, the tree has no opinion.

The hybrid pattern that pays off: start with a hand-written decision tree, instrument it for the cases it handles badly, then either add rules or switch to a learned model when the rule list gets unmaintainable. Many production systems live in this hybrid space for years.

Expert systems: the ancestor

A rule-based expert system is a large collection of if-then rules with an inference engine that chains them together. They were the fashionable AI of the 1970s and 1980s. MYCIN for medical diagnosis, DENDRAL for chemistry, XCON for configuring DEC computers.

The expert-system winter, when these projects mostly disappointed, gave rule-based AI a bad name in popular memory. But the practical lessons remain:

Rules work well for stable, well-understood domains.
Maintaining a rule base of more than a few thousand rules is hard. Conflicts emerge. Edge cases pile up. The system becomes brittle.
Combining rules with statistical methods, using rules for the cases you understand and ML for the rest, is often more practical than picking one.

Modern descendants live in:

Drools and similar business-rules engines, used in insurance underwriting, banking compliance, and benefits administration.
Prolog and other logic-programming systems, mostly in academia but still used commercially in some niches.
Datalog for analytic and policy reasoning over relational data.

When you hear “rule-based system” today, it’s usually a Drools-style production rule engine making decisions in a regulated domain.

When to use rules: a triage

Property of your problem	Lean toward rules	Lean toward ML
Auditability requirement	Strong (regulatory, legal, safety)	Weak (best-effort relevance)
Latency budget	Microseconds	Milliseconds or seconds OK
Per-call cost tolerance	Must be near-zero	Some cost is fine
Input variability	Bounded (formats, codes, structured)	Open-ended (natural language)
Domain expert availability	Yes, can write down the rules	No, has to be learned from data
Drift	Slow (years)	Fast (weeks/months)
Volume of labelled data	Zero, rules are the labels	Thousands of examples
Acceptance of failures	Failures must be debuggable	Probabilistic failures OK

The hybrid pattern

The best production systems usually mix rules and learning. The pattern, in rough form:

Rules at the edges. Fast pre-filters that reject obvious garbage and recognise unambiguous cases.
Statistical models in the middle. ML for the genuinely ambiguous cases the rules can’t handle.
Rules at the edges again. Post-filters that catch known-bad model outputs and apply business logic.

A spam filter does this: regex catches the obvious phishing patterns; a statistical model handles the borderline cases; a final rule layer applies user preferences and explicit allowlists. A search relevance system does this: lexical matching with rules first; semantic ranking with embeddings; business-rule re-ranking last.

The reason the hybrid wins is that rules and ML have inverse strengths and weaknesses. Rules are precise but rigid; ML is flexible but fuzzy. Used together, they cover each other’s gaps.

Rules are AI’s quiet underclass. They run more production systems than transformers do, and most teams forget about them until the regulator emails or the latency budget collapses. Regex handles patterns that are genuinely regular. Finite-state transducers extend that into composition and structured output for speech and morphology. Context-free grammars take over when nesting and recursion show up. Hand-written decision trees encode the business logic nobody wants buried in a model. Production rule engines like Drools still run the parts of insurance, banking, and compliance where every decision needs a trace.

The version that wins in production is rarely all-rules or all-learning. It’s rules at the edges, fast pre-filters that catch the obvious cases and post-filters that apply business policy, with statistical models in the middle handling the genuinely ambiguous inputs. Rules and learning have inverse strengths. Used together, they cover each other’s gaps. The next four posts in the series leave the language-and-text neighbourhood and pick up the rest of the classical AI textbook, search and planning, logic, constraints, probability.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.