Series
The AI Field Guide
Large language models are the loudest part of AI, not the only part. This series covers the rest of the field -- diffusion, encoder-only and encoder-decoder transformers, classical NLP, search and planning, logic, constraints, probabilistic reasoning -- so that picking the right tool for a job stops being a guessing game. Part of Under the Hood.
Under the Hood · The AI Field Guide
How LLMs Actually Work
Tokens, transformers, attention, and the training pipeline: what large language models actually do when they 'predict the next token', why they hallucinate, and why they're so good at code.
Read articleUnder the Hood · The AI Field Guide
To LLMs… and Beyond!
LLMs are one corner of a much larger field. Diffusion models, reasoning models, multimodal systems, open-weight vs closed — what they are, how they differ, and how to choose.
Read articleUnder the Hood · The AI Field Guide
The Other Transformers
BERT and T5 are transformers too, but they aren't trying to be ChatGPT. They're trying to be the boring layer underneath -- classifiers, embeddings, structured transformations -- and they're often a better answer than an LLM.
Read articleUnder the Hood · The AI Field Guide
The Reranker You Didn't Know You Needed
RAG explanations stop at 'embed the query, look up the nearest documents, hand them to the LLM.' That's the demo. In production, there's a second pass between the lookup and the LLM, and it's the one that actually makes retrieval work.
Read articleUnder the Hood · The AI Field Guide
After the Transformer
Transformers have ruled language modelling for nearly a decade. They have a known weakness, and several research lines are trying to replace them. Mamba, RWKV, RetNet, Hyena, diffusion-for-text -- what they are, what they fix, and which ones are likely to matter.
Read articleUnder the Hood · The AI Field Guide
Before the Transformer
n-grams. HMMs. CRFs. The language models and sequence taggers that ran the internet before deep learning, and that quietly still do, in autocomplete, spam filters, biomedical NER, speech recognition. What they are, why they still ship, and when they're the correct answer.
Read articleThe Boring Baseline That Wins
TF-IDF, logistic regression, naive Bayes, k-means, LDA. The fifty lines of scikit-learn that beat your fancy model on the small problem you actually have. Why these baselines still win, and why the correct starting point in 2026 is often the same as it was in 2006.
Coming soon