Building intelligent RAG & agentic AI systems.
I help enterprises build AI that actually works in production. Ahad Khan — Generative AI Engineer at Capgemini, turning RAG prototypes into reliable, hallucination-free pipelines.

Core Technologies
Currently Building
Agentic RAG for Manufacturing
Multi-agent RAG system for manufacturing Q&A achieving 90%+ retrieval accuracy. Router → Retriever → Generator pipeline answering queries from equipment manuals and safety SOPs — fully offline, zero cloud dependency.
Recent Work
View all projects ->AI Gym Memory System
Conversational workout tracker with sub-second intent extraction via Gemini Flash. Log exercises in natural language and query history semantically — 'What did I train last Tuesday?' — with ChromaDB vector retrieval.
Document Intelligence RAG
Enterprise RAG system handling 10K+ document pages with Azure AI Search. Hybrid retrieval (BM25 + semantic) reduced hallucinations by 85% while maintaining sub-2s response times. Source-attributed answers via GPT-4.
MiA-RAG: Mindscape-Aware RAG
Paper-accurate implementation of Mindscape-Aware RAG (arXiv:2512.17220) achieving +12% recall over baseline retrievers. Uses MiA-Emb-0.6B with hierarchical summarization and residual score fusion for context-enriched retrieval.
Latest Posts
View all posts ->Zero-Cloud Agentic AI: Running Milvus and Local LLMs On-Prem
Sending sensitive internal data to closed APIs wasn't an option. Here is the exact architecture I used to build a fully local, autonomous agentic pipeline using Milvus, Ollama, and open-source embeddings.
How I Set Up an On-Prem Agentic AI Stack with Open-Source Embeddings and Fully Local Inference
A practical guide to building a fully on-prem agentic AI system using open-source embeddings and local LLM inference — no APIs, no cloud, complete data control.
Qwen 3.5 in Production: Running with vLLM and Deploying Local Inference on Azure VM
A deep dive into deploying Qwen 3.5 with vLLM for high-throughput inference and running cost-efficient local inference on Azure VMs with GPU acceleration.