Projects

Systems I've built and evaluated. Each project includes methodology, metrics, and honest limitations.

RAG Hybrid Retrieval Cross-Encoder Evaluation

PaperSearch

Academic Paper Research Assistant

RAG system that retrieves relevant passages from 1,000 academic papers and generates cited answers. Validated against the Open RAG Benchmark with 3,045 human-authored queries.

Key findings

  • Hybrid retrieval (dense + BM25) dominated all top configurations
  • MiniLM matched mpnet quality at 5× the speed
  • Reranking improved MRR by 7.6% (unlike the financial system)

Results

MRR

0.789

NDCG@5

0.797

Recall@5

0.89

LLM Structured Output FastAPI Evaluation

Synthetic Data Pipeline

Resume-Job Match Review System

FastAPI service that reviews resume-job pairs for compatibility. Rules-based pre-filtering plus LLM-as-judge scoring with structured outputs.

Key findings

  • Rules-based filtering caught 40% of mismatches without LLM calls
  • Structured outputs (Instructor + Pydantic) achieved 0% parse failures
  • Latency benchmarking identified optimal batch sizes

Results

Parse failures

0%

Pre-filter rate

40%

Synthetic Data LLM Structured Output

Synthetic Data Generator

DIY Repair Q&A Dataset

Pipeline to generate realistic Q&A pairs for DIY home repair. Instructor library for structured outputs, LLM-as-judge validation, quality metrics.

Key findings

  • Structured output constraints eliminated formatting failures
  • Diversity gap exists at dataset level, not individual item level
  • LLM-as-judge enabled automated quality filtering

Results

Format failures

0%

Quality score

4.2/5