My Wife Wanted Dior. I Spent $600 on Claude Code to Vibe-Code a 2M-Line Database Instead.

My wife and I have been married for ten years, and she wanted a Dior bag for our anniversary.
Instead of buying anything — because I was completely absorbed in an AI experiment — I spent the entire holiday locked in my study, juggling three $200/month Claude Code subscriptions, trying to convince an LLM to cross-compile a 2-million-line C++ distributed database.
And you probably can guess what happened to me later. 😂
In hindsight, this was not optimal resource allocation.
So I learned two lessons that weekend.
First: listen to your wife. “Happy wife, happy life” isn’t just a slogan. It’s a systems-stability principle.
Second: AI feels magical on small, well-scoped tasks. It behaves very differently when you point it at a real distributed infrastructure.
This post is about the second lesson.
Background
I'm the maintainer and a core contributor of Milvus, the most popular open-source vector database with 42K+ stars on GitHub at the time of writing (~2M lines of C++, Go, and Python). The system is fully distributed: proxy nodes, query nodes, data nodes, and index nodes all coordinate through message queues. My area is the storage and indexing layer.
I'd been using Claude Code for a few months and was genuinely impressed. It filled in all the missing features of an entire CLI for $20 in tokens. It 5x'd a query performance hot path in a day, the kind of optimization that would take me a week to even understand the code well enough to touch. It felt like having a competent junior engineer who never slept and didn’t bill by the hour.
I decided to give it a real problem: cross-platform compilation.
The $600 lesson learnt
For years, nobody on the Milvus team would touch cross-platform compilation. The build system is a mix of Go, C++, and Rust held together by accumulated years of Conan and CMake patches nobody wanted to revisit. Getting it to run on Linux is already miserable. Windows and modern macOS were such a nightmare to build that the team treated them as someone else's problem.
I figured I had Claude Code now. I could handle it.
At first, it looked like I was right. Windows compiled, and I submitted the patch, thinking the hard part was over.
Then Linux failed. I fixed Linux and Mac failed. I fixed Mac, and the GPU environment failed. Every fix introduced two new problems on a different platform, and patches started compounding. By the end, I had a 1000-file, ~100k-line patch touching Conan configs, CMake scripts, C++ compatibility shims, and things I didn't recognize.
The maddening part wasn't the bugs; it was the loop. I kept telling Claude, "No hacky fixes," "give me the clean solution." It complied every single time, generating patches that looked cleaner. Then each clean patch broke three other platforms, and the cycle restarted. If you looked at my .claude/settings.local.json, you'd find 147 manually-approved permission rules. Each one is a timestamp of me thinking, "This time it'll work."
I spent three months on the Max plan and burned an entire vacation, and all I had was a pile of git reset --hard commands.
I sat there staring at my terminal, wondering if this whole thing was just well-packaged hype. Then I realized the problem wasn't Claude. It was like telling a grad student to get a paper into Nature without specifying the research question. I'd handed it a problem without ever defining what "solved" actually meant.
What actually works
The failure wasn't Claude's intelligence. It was a failure of process. So I started over with a different one.
Constraints before code. Not "make it compile on Windows." The actual constraints, written down explicitly: all platforms pass their unit tests, CI is green everywhere, no #ifdef platform hacks, no shimming around broken dependencies. This is the hard part — it requires knowing what "done" looks like, which requires thinking carefully before you touch anything. With complex infrastructure, most of the work is in thinking.
Review tests, not code. I write test cases (or have Claude generate them), but I review the tests, not the implementation. Does this test check that the Conan recipe resolves correctly on ARM? Does it cover the CMake configuration in Docker? I can evaluate that in minutes. I cannot review 10,000 lines of cross-platform C++ patches with any confidence, in any amount of time.
Bottom-up, one layer at a time. Don't let Claude change 1000 files. Lock down the dependency versions first. Once those constraints are verified, move up to CMake configuration. Once that's stable, platform-specific code. Each layer is small enough to verify completely before moving up.
I redid the cross-platform build from scratch using this approach. Two days. A few dozen commits. Each one is small, each one with a corresponding test. No #ifdef hacks — the underlying fix was updating Conan recipes and bumping third-party library versions, fixing the actual dependency chain rather than papering over it.
Same task. Completely different outcome.
The test insight is worth its own point: in a distributed system with a mature test suite, the tests are the specification. If those integration tests pass — all 47 that turned red during my first attempt — then the seal/flush contracts are intact, the WAL replay is correct, and query nodes can still mmap-load segments without corruption. I don't need to read the C++ to know that.
I stopped reviewing code. I started reviewing tests. The code became an implementation detail.
Throwing hardware at it
Once the workflow was right, a new bottleneck appeared: I was sitting there waiting.
One Claude Code session chewing through a 2M-line codebase isn't fast. Each task takes 20-30 minutes of wall-clock time. The bottleneck wasn't intelligence. It was throughput.
So I told my wife I needed a server with a GPU for my “AI agents.” That conversation went about as well as you’d expect. I bought it anyway — plus a Mac Mini to go with my MacBook. In the end, I had three machines and six terminals, each running an independent Claude Code session.
What makes this work is git worktree. Each session gets its own task, branch, and working directory, completely isolated from the others. And because the constraints-first approach means each branch has its own acceptance criteria, parallelism is trivially safe: if each branch's tests pass independently, merging is low-risk.
For the cross-platform build, this meant one session resolving Linux ARM Conan dependencies, one fixing macOS CMake configuration, and one handling Windows MSVC compatibility, all running simultaneously without interference. What used to be serial context-switching between platforms became parallel execution across machines.
The pattern generalizes to daily work. On any given day, one session is refactoring the compaction scheduler, another is optimizing the HNSW search path, and a third is writing integration tests for a streaming insert edge case. I bounce between terminals like a manager checking on direct reports, except these direct reports never need coffee breaks and don't have opinions about sprint planning.
The real lesson is that vibe coding infrastructure is compute-bound, not talent-bound. The limiting factor isn't how smart the AI is; it's how many instances you can run in parallel. Anthropic used 16 parallel Claude instances to build an entire C compiler, which made my six terminals feel modest.
The thing I keep coming back to
AI solves exactly the problem you put in front of it, nothing more. If you frame the problem wrong, you get a perfect solution to the wrong thing.
"Compiles on macOS 15" — solved in an hour. But that's a local optimum. The actual goal was "compiles everywhere without hacks." Those are different problems. The AI had no way to know that. That's entirely on the engineer.
Building for yourself with AI is shockingly easy — you know your machine, your edge cases, and "works for me" is an achievable optimization target. Building infrastructure that works for every user, on every platform, in every environment is fundamentally different. That gap is where engineering still lives. For infrastructure software, that gap is enormous.
The hardest bugs in our production system — the ones that required reverting after merge — weren't caught by any AI code review approach we tried. The bugs were syntactically correct. The problem was in the developer's implicit assumptions, invisible in the diff and absent in the surrounding code. The system behavior only made sense if you held a mental model of how three different components were supposed to coordinate — a model that existed only in one developer's head, never written down.
Those still require a human who understands the system well enough to know what questions to ask.
The three $200 plans and one lost vacation were the best investment I've made. Not because they taught me how to use AI. Because they taught me how not to.
The tools are genuinely good. The missing piece was always the workflow around them. Tests before code, constraints before tests, and enough hardware to run it all in parallel. That's the whole thing.
The problems that remain unsolved
After everything — the failed patch, the reset commands, the six parallel terminals — the hardest problem to crack is still the human one.
For now, I have two headaches, and neither of them has a clean fix.
The first: my wife. She came into the vacation with reasonable expectations. She left watching me git reset --hard my life. The Dior bag budget became a server bill. She is, understandably, not impressed by my cross-platform build achievements. I have yet to find a workflow for "how to apologize for spending your anniversary doing distributed systems work." Claude Code is not helpful here. I tried asking. It suggested flowers and a handwritten note. I told it to be more specific. It generated twelve variations of a heartfelt letter in the voice of a C++ developer. None of them landed.
If anyone has solved the problem of maintaining a relationship while vibe-coding through a vacation, I am very open to workflow suggestions. 😂
The second: how to scale this beyond me. Everything above — the constraints-first approach, the git worktree parallelism, the test-first review discipline — is currently in my head and my local setup. Getting it into a team's daily workflow is harder than any cross-platform build.
This is partly a tooling problem. But mostly it's a people problem. This kind of workflow requires engineers who genuinely enjoy figuring out how things work — people who will actually write the test first, who find satisfaction in a clean, well-defined constraint, and who won't shortcut to "just make it pass." That combination is rarer than it should be.
So yes — we’re hiring.
If you’re the kind of engineer who reads an article about infrastructure vibe coding and immediately starts forming strong opinions about what I got wrong, you’re exactly the person I want to talk to.
Milvus is a genuinely hard distributed systems problem: storage engines, indexing pipelines, compaction, replication, failure recovery — all under real-world production load. We’re not building demos. We’re building infrastructure people depend on.
If solving problems at that level sounds interesting, come build with us.
And I’ll handle the anniversary recovery plan.
You can check out our open roles or reach out to me directly on LinkedIn.
And if you disagree with anything in this post, even better — I’m happy to discuss it with you.
Join our Slack community or book a Milvus Office Hours session — we’re always happy to talk distributed systems, AI infrastructure, or where you think we’re wrong.
- Background
- The $600 lesson learnt
- What actually works
- Throwing hardware at it
- The thing I keep coming back to
- The problems that remain unsolved
- So yes — we’re hiring.
Content
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for FreeKeep Reading

Balancing Precision and Performance: How Zilliz Cloud's New Parameters Help You Optimize Vector Search
Optimize vector search with Zilliz Cloud’s level and recall features to tune accuracy, balance performance, and power AI applications.

ColPali + Milvus: Redefining Document Retrieval with Vision-Language Models
When combined with Milvus's powerful vector search capabilities, ColPali becomes a practical solution for real-world document retrieval challenges.

Vector Databases vs. Document Databases
Use a vector database for similarity search and AI-powered applications; use a document database for flexible schema and JSON-like data storage.
