ahad.

Blog

Deep dives into RAG pipelines, agentic AI, LLM integrations, and lessons from building production AI systems.

Featured

I Run an AI Agent on a VPS. Here's My Actual Setup

A walkthrough of my real OpenClaw deployment: 13 Telegram topics, GPT-5.2 on Azure free tier, heartbeat-driven morning briefings, Playwright browser automation, memsearch semantic recall, and a Second Brain that auto-captures everything I text. Pulled directly from my live droplet.

LLMAgentic AIOpen Source|Mar 5, 2026· 17 min read

Zero-Cloud Agentic AI: Running Milvus and Local LLMs On-Prem

Sending sensitive internal data to closed APIs wasn't an option. Here is the exact architecture I used to build a fully local, autonomous agentic pipeline using Milvus, Ollama, and open-source embeddings.

Mar 5
7 min read
LLMAgentic AIOn-Prem

How I Set Up an On-Prem Agentic AI Stack with Open-Source Embeddings and Fully Local Inference

A practical guide to building a fully on-prem agentic AI system using open-source embeddings and local LLM inference — no APIs, no cloud, complete data control.

Mar 3
7 min read
LLMQwenvLLM

Qwen 3.5 in Production: Running with vLLM and Deploying Local Inference on Azure VM

A deep dive into deploying Qwen 3.5 with vLLM for high-throughput inference and running cost-efficient local inference on Azure VMs with GPU acceleration.

Mar 1
8 min read
RAGLLMInformation Retrieval

MiA-RAG: Mindscape-Aware Retrieval-Augmented Generation for Long-Context Reasoning

MiA-RAG introduces a mindscape-aware embedder and retriever that inject global semantic context into RAG pipelines, dramatically improving long-document QA accuracy and retrieval recall.

Feb 27
7 min read
GeminiChromaDBFastAPI

Natural Language Workout Logging with Gemini Flash

Using Google's Gemini Flash for intent extraction to build a conversational workout tracker with semantic memory retrieval via ChromaDB.

Feb 28
2 min read
RAGAgentic AIFastAPI

Building an Agentic RAG Pipeline for Manufacturing

How I designed a multi-agent RAG system that answers questions from factory equipment manuals, safety SOPs, and maintenance logs — running fully offline with Ollama and Milvus Lite.

Feb 15
2 min read
RAGAzure AI SearchInformation Retrieval

Hybrid Search in Enterprise RAG: Vector + BM25 Scoring

Why pure vector search isn't enough for enterprise documents, and how combining semantic embeddings with BM25 keyword matching dramatically improves retrieval accuracy.

Jan 28
2 min read
InfrastructureLLMOn-Prem

Sovereignty at Scale: Engineering Production-Grade RAG on Bare Metal

Stop paying the 'Internet Tax' and risking data leaks. We moved our RAG pipeline from SaaS to a local H100 cluster, cutting latency by 40% and TCO by 70% at scale.

Oct 24
11 min read
LLMRAGVectorDB

Moving Beyond Naive RAG: How We Built a 90% Hit-Rate Pipeline for Production

Basic vector search fails in production. Learn how we engineered a multi-stage RAG pipeline with hybrid search, re-ranking, and agentic loops to achieve 90%+ accuracy.

May 22
11 min read
Showing 10 of 10 articles