How I'm Rebuilding Cursor with My Own Homelab AI — Zero API Costs

How I'm Rebuilding Cursor with My Own Homelab AI — Zero API Costs

🚫 Why Not Just Pay for Cursor or Copilot?

Because I knew I could do better — for free.

Cursor and GitHub Copilot Chat are incredible tools. They inject context, let you chat about your codebase, and feel like having a pair-programmer built into your editor. But here's the thing:

  • They require paid API keys (OpenAI, Claude, etc.)
  • They tie your usage and logs to external systems
  • They don’t respect your compute — they rent you theirs

I wanted the same dev-experience, but entirely self-hosted, powered by my own GPU server at home. No token limits, no vendor lock-in.

And I’m almost there.


🧱 My Stack (aka The Homelab Dev Agent Engine)

Component Role
Ollama Local LLM runtime
DeepSeek-R1 Fast, local, open chat model
Node.js API My custom wrapper server
Avante.nvim Neovim AI interface plugin
PostgreSQL Token, prompt, error tracking
Tailscale VPN Internal access across machines
I wanted to mimic Cursor — with /slash commands, local file injection, contextual AI completions — using nothing but my own infrastructure.

🧩 How It Works (End to End)

  1. My Node.js API does the heavy lifting:
    • Validates bearer/API keys
    • Accepts selected_files from Avante
    • Reads each file and injects its contents into the prompt (like Copilot Chat)
    • Sends the structured request to Ollama, locally
    • Streams the AI response back to Neovim
    • Logs every prompt, token count, and even simulated OpenAI/Claude pricing into PostgreSQL
  2. The Ollama Server, running DeepSeek-R1 on my GPU workstation, handles completions locally.

Avante.nvim in Neovim is configured to call my API:

require("avante").setup({
  endpoint = "https://api-ai.cargdev.io/api/chat",
  api_key = "local-dev-key",
  model = "deepseek-r1:latest"
})

🪄 Cursor-Like Features I Already Have

Feature Status Implementation
File-aware chat Avante + my wrapper inject file contents
Slash commands extensions.avante.make_slash_commands
Streaming responses Via Ollama stream + SSE
Chat UI in Neovim Avante handles buffer formatting
Prompt + error logging PostgreSQL backend
Token cost simulation GPT-4, Claude, Gemini savings logged

🧪 Still Missing Full Cursor Behavior — But Getting Close

I don’t yet have true buffer modification support (e.g. code refactor suggestions auto-applied). Cursor and Copilot Chat can modify code intelligently based on full-project context.

But the pieces are almost in place:

  • My wrapper supports per-file context
  • Prompt injection works for any file selection
  • Avante can be extended or patched to support diff/edit mode

And unlike Cursor, my tool:

  • Runs 100% local
  • Is fully inspectable
  • Costs me $0/month

📊 Tracking Everything

Every request logs:

{
  "model": "deepseek-r1:latest",
  "total_tokens": 9992,
  "prompt": "...",
  "selected_files": ["mcphub.lua"],
  "created_at": "2025-05-24"
}

I also record:

  • GPT-4 equivalent cost (e.g. $0.10)
  • Claude and Gemini pricing
  • My actual cost: $0

This makes it easy to say: “This week I saved $7.20 in token fees.”


🙌 Want to Use or Contribute?

Check out the repo:

🔗 https://github.com/CarGDev/apiAi

You’ll find:

  • Express.js wrapper code
  • Ollama integration
  • Token tracking logic
  • Prompt/error endpoints
  • Sample setup for Avante.nvim

Feel free to fork it, log an issue, or submit a pull request. There’s plenty of room to grow: edit suggestions, embedded diff views, buffer streaming, agent queues, you name it.


☕ Final Thoughts

This isn’t just about saving money — it’s about owning your tools.

I’m not building a Cursor clone for the hype.
I’m building a private AI dev agent, fully under my control.

If you’re a Neovim user, have a spare GPU, and want to break free from paid APIs, give this a try.

There’s still a lot to add — but the core is solid.

And yeah, I proudly say it:

“I’m running my own Cursor-style AI stack — from my homelab.”

More posts coming soon. Until then, keep breaking and building. 💻🔥