Code & Coffee ☕ by CarGDev
  • Home
  • About
  • Journey
Sign in Subscribe

LLM Inference Optimization

Engineering Lessons From Building an LLM PoC: What I Learned Building SheepOp

Engineering Lessons From Building an LLM PoC: What I Learned Building SheepOp

Creating a full transformer implementation and RAG optimizer from scratch provides insights that typical high-level frameworks obscure. Over the past days, I've built two open-source projects—SheepOp, a transformer implementation from scratch, and the LLM RAG Data Structures Optimizer, a production-grade optimization library. The experience revealed several lessons
Carlos Gutierrez 29 Nov 2025
KV Caching, Prefix Sharing, and Memory Layouts: The Data Structures Behind Fast LLM Inference

KV Caching, Prefix Sharing, and Memory Layouts: The Data Structures Behind Fast LLM Inference

Large Language Models (LLMs) have reshaped how we build applications, but behind the scenes, their performance depends heavily on the data structures and algorithms that support them. While pre-training and fine-tuning often receive the spotlight, the real engineering challenge emerges in production systems: delivering fast, cost-efficient inference at scale. I
Carlos Gutierrez 16 Nov 2025

Subscribe to Code & Coffee ☕ by CarGDev

Don't miss out on the latest news. Sign up now to get access to the library of members-only articles.
  • Sign up
Code & Coffee ☕ by CarGDev © 2026. Powered by Ghost