๐Ÿงชโš–๏ธ Benchmarking LLM RAGs: Python vs .NET

February 19, 2026

๐Ÿงช Benchmarking LLM RAGs: Python vs .NET (No Spoilers Yet)

Iโ€™ve been working on a benchmark that compares LLM-powered RAG pipelines in Python and .NET for candidate matching.

Quick note before we start: this is intentionally a no-spoilers version because Iโ€™ll share the final conclusions in a meeting first.

Why this benchmark

Most comparisons online focus on framework preference or subjective opinions. I wanted a setup where both stacks run under comparable conditions so we can discuss tradeoffs with data, not vibes.

The repos behind the work

How the preprocessor works

Before benchmarking retrieval quality, resumes are normalized through a preprocessing pipeline so both stacks start from the same structured input.

  • Scans PDF resumes and extracts text content
  • Builds LLM prompts to produce structured candidate profiles
  • Supports provider switching via configuration (local/cloud)
  • Generates JSON outputs that feed the benchmark dataset

What is being compared

The benchmark compares two implementations of the same product intent:

  • Python pipeline (LangChain-based orchestration)
  • .NET pipeline (Microsoft.Extensions.AI-based orchestration)
  • Shared embeddings strategy for consistency
  • Equivalent datasets and prompt intent across both stacks

How the evaluation is designed

This is not a single-run screenshot benchmark. It uses repeatable execution and score aggregation to reduce random variance.

  • Multiple evaluation runs per prompt
  • Consistent temperature settings for deterministic behavior
  • Quality evaluation plus performance/load testing
  • Parity checks so differences reflect orchestration, not data drift
If we want fair conclusions, we need parity in inputs, embeddings, prompts, and scoring rules.

Why this matters

Teams choosing between Python and .NET for AI features usually ask the same thing: Which stack gives better quality, reliability, and operational confidence for real business use?

This work is my attempt to answer that with a practical, reproducible benchmark instead of guesswork.

Results soon โ€” after the meeting. ๐Ÿ˜‰

#RAG #LLM #Python #DotNet #Benchmarking #AIEngineering