About

I'm an AI engineer focused on retrieval systems and LLM-powered applications. I build things, measure whether they work, and write about what I learn.

What I work on

  • RAG systems: chunking strategies, embedding models, retrieval methods, reranking
  • Evaluation pipelines: synthetic QA generation, retrieval metrics (MRR, NDCG, Recall@K)
  • LLM applications: structured outputs, tool use, prompt engineering

How I approach problems

Every project starts with the same questions:

  1. What does success look like? (Define the metric)
  2. Where are we now? (Measure the baseline)
  3. What changed? (Prove the delta)

This isn't revolutionary. It's just discipline. But it's the difference between "I think this works" and "here's the evidence."

Background

I spent 25 years in enterprise consulting (MarkLogic, Oracle, RightNow), leading technical delivery across APAC and North America. I translated business requirements into working systems for banks, telcos, and government agencies, and managed international teams doing the same.

Now I'm applying that lens to AI engineering. I build evaluation pipelines, RAG systems, and synthetic data workflows: the infrastructure that tells you whether an LLM actually works before it reaches production. My projects emphasize measurable outcomes over demos: retrieval accuracy metrics, structured evaluation frameworks, failure-mode analysis.

I'm not coming from a research background. I'm coming from delivery, where things need to work reliably, at scale, for real users. That turns out to be exactly what's missing in most LLM deployments.