About

I'm an AI engineer focused on retrieval systems and LLM-powered applications. I build things, measure whether they work, and write about what I learn.

What I work on

  • RAG systems — chunking strategies, embedding models, retrieval methods, reranking
  • Evaluation pipelines — synthetic QA generation, retrieval metrics (MRR, NDCG, Recall@K)
  • LLM applications — structured outputs, tool use, prompt engineering

How I approach problems

Every project starts with the same questions:

  1. What does success look like? (Define the metric)
  2. Where are we now? (Measure the baseline)
  3. What changed? (Prove the delta)

This isn't revolutionary — it's just discipline. But it's the difference between "I think this works" and "here's the evidence."

Background

I spent 25 years in enterprise consulting — MarkLogic, Oracle, RightNow — leading technical delivery across APAC and North America. I translated business requirements into working systems for banks, telcos, and government agencies, and managed international teams doing the same.

Now I'm applying that lens to AI engineering. I build evaluation pipelines, RAG systems, and synthetic data workflows — the infrastructure that tells you whether an LLM actually works before it reaches production. My projects emphasize measurable outcomes over demos: retrieval accuracy metrics, structured evaluation frameworks, failure-mode analysis.

I'm not coming from a research background. I'm coming from delivery — where things need to work reliably, at scale, for real users. That turns out to be exactly what's missing in most LLM deployments.