🧪 Benchmarking LLM RAGs: Python vs .NET (No Spoilers Yet)

I’ve been working on a benchmark that compares LLM-powered RAG pipelines in Python and .NET for candidate matching.

Quick note before we start: this is intentionally a no-spoilers version because I’ll share the final conclusions in a meeting first.

Why this benchmark

Most comparisons online focus on framework preference or subjective opinions. I wanted a setup where both stacks run under comparable conditions so we can discuss tradeoffs with data, not vibes.

The repos behind the work

Resume preprocessor: github.com/maurogioberti/llm-candidate-profiler
RAG benchmark (multi-language): github.com/maurogioberti/llm-candidate-rag-benchmark-multilang

How the preprocessor works

Before benchmarking retrieval quality, resumes are normalized through a preprocessing pipeline so both stacks start from the same structured input.

Scans PDF resumes and extracts text content
Builds LLM prompts to produce structured candidate profiles
Supports provider switching via configuration (local/cloud)
Generates JSON outputs that feed the benchmark dataset

What is being compared

The benchmark compares two implementations of the same product intent:

Python pipeline (LangChain-based orchestration)
.NET pipeline (Microsoft.Extensions.AI-based orchestration)
Shared embeddings strategy for consistency
Equivalent datasets and prompt intent across both stacks

How the evaluation is designed

This is not a single-run screenshot benchmark. It uses repeatable execution and score aggregation to reduce random variance.

Multiple evaluation runs per prompt
Consistent temperature settings for deterministic behavior
Quality evaluation plus performance/load testing
Parity checks so differences reflect orchestration, not data drift

If we want fair conclusions, we need parity in inputs, embeddings, prompts, and scoring rules.

Why this matters

Teams choosing between Python and .NET for AI features usually ask the same thing: Which stack gives better quality, reliability, and operational confidence for real business use?

This work is my attempt to answer that with a practical, reproducible benchmark instead of guesswork.

Results soon — after the meeting. 😉

#RAG #LLM #Python #DotNet #Benchmarking #AIEngineering