I'm Salim Jordan,
an AI Engineer building
RAG systems that work.

I build, measure what matters, and share what I learn. No "thought-leadership speak" — just findings with numbers.

Portrait of Salim Jordan

What I do

RAG systems

I design and tune retrieval pipelines end-to-end: chunking, embeddings, hybrid search, and reranking.

Evaluation pipelines

I build measurement workflows with retrieval metrics like MRR, NDCG, and Recall@K to prove what actually improved.

LLM products

I ship LLM-powered features with structured outputs, tool use, and practical reliability constraints.

Who I am

I spent 25 years in enterprise consulting — MarkLogic, Oracle, RightNow — leading technical delivery for banks, telcos, and government agencies across APAC and North America. Now I'm applying that lens to AI engineering: evaluation pipelines, RAG systems, and the infrastructure that tells you whether an LLM actually works before it ships.

Background: Technical leadership and delivery — where things need to work reliably, at scale, for real users.

Focus: Measurable outcomes over demos. Retrieval metrics, failure-mode analysis, production-minded AI.

How I work

1

Define success

What metric matters? What's the target? What does "good enough" look like?

2

Measure the baseline

Before optimizing, know where you are. Can't claim improvement without a starting point.

3

Iterate with evidence

Each experiment answers a question. The data tells you what to try next.

What I've built

Recent findings

The Model Wasn't Random — It Was Backwards

I expected the model to be neutral. Instead, it scored incompatible pairs higher than compatible ones. Understanding why made fine-tuning more meaningful.

Baseline AUC

0.40

Fine-tuned AUC

0.91

Read the full analysis →