Code & Coffee ☕ by CarGDev
  • Home
  • About
  • Journey
Sign in Subscribe

Self Hosted AI

KV Caching, Prefix Sharing, and Memory Layouts: The Data Structures Behind Fast LLM Inference

KV Caching, Prefix Sharing, and Memory Layouts: The Data Structures Behind Fast LLM Inference

Large Language Models (LLMs) have reshaped how we build applications, but behind the scenes, their performance depends heavily on the data structures and algorithms that support them. While pre-training and fine-tuning often receive the spotlight, the real engineering challenge emerges in production systems: delivering fast, cost-efficient inference at scale. I
Carlos Gutierrez 16 Nov 2025
LLM understanding

LLM understanding

Recently I got an idea during one of the assignments of the master's, what the heck with the LLMs, how those work, how many types are, what is an LLM in fact. Beginning The moment started, and then I saw a YouTube video by Andrej Karpathy, one of
Carlos Gutierrez 06 Nov 2025
How I'm Rebuilding Cursor with My Own Homelab AI — Zero API Costs

How I'm Rebuilding Cursor with My Own Homelab AI — Zero API Costs

🚫 Why Not Just Pay for Cursor or Copilot? Because I knew I could do better — for free. Cursor and GitHub Copilot Chat are incredible tools. They inject context, let you chat about your codebase, and feel like having a pair-programmer built into your editor. But here's the thing:
Carlos Gutierrez 24 May 2025

Subscribe to Code & Coffee ☕ by CarGDev

Don't miss out on the latest news. Sign up now to get access to the library of members-only articles.
  • Sign up
Code & Coffee ☕ by CarGDev © 2026. Powered by Ghost