Optimising RAG retrieval: beyond naive top-K
A practical note on retrieval depth, ranking tradeoffs, and how candidate selection shapes latency, recall, and answer quality in production RAG systems.
Research & Documentation
A structured archive of practical writing on retrieval systems, model inference, evaluation frameworks, and machine learning architecture. These notes are meant to be read as engineering documents: clear, specific, and production-minded.
A practical note on retrieval depth, ranking tradeoffs, and how candidate selection shapes latency, recall, and answer quality in production RAG systems.
A systems-oriented view of throughput, latency, memory pressure, and serving efficiency when deploying modern LLM inference workloads.
Why offline tests are not enough, how to think about grounded evaluation, and what robust measurement looks like when LLM systems move into production.
A documentation-style write-up on data platform design, modelling patterns, and how disciplined architecture supports scalable analytics and machine learning systems.
Need help building?
If your team is designing LLM systems, retrieval infrastructure, or enterprise ML platforms, we can help turn architecture and research into production-ready implementations.
Work with us