Technical notes & thinking.

A structured archive of practical writing on retrieval systems, model inference, evaluation frameworks, and machine learning architecture. These notes are meant to be read as engineering documents: clear, specific, and production-minded.

Optimising RAG retrieval: beyond naive top-K

A practical note on retrieval depth, ranking tradeoffs, and how candidate selection shapes latency, recall, and answer quality in production RAG systems.

RAG / Retrieval

LLM inference at scale: continuous batching and speculative decoding

A systems-oriented view of throughput, latency, memory pressure, and serving efficiency when deploying modern LLM inference workloads.

Inference / Batching

Evaluation pipelines for enterprise LLM systems

Why offline tests are not enough, how to think about grounded evaluation, and what robust measurement looks like when LLM systems move into production.

Evaluation / Quality

Data Vault 2.0 for machine learning: a practitioner's account

A documentation-style write-up on data platform design, modelling patterns, and how disciplined architecture supports scalable analytics and machine learning systems.

Platform / Architecture

Need help building?

From notes to production systems.

If your team is designing LLM systems, retrieval infrastructure, or enterprise ML platforms, we can help turn architecture and research into production-ready implementations.

Work with us