Skip to content
claude-context MCP: Semantic Code Search Over Your Entire Codebase
TutorialsApril 28, 20269 min read

claude-context MCP: Semantic Code Search Over Your Entire Codebase

claude-context by Zilliz is an MCP server that adds semantic vector search to Claude Code. Cut token costs 40% on large codebases. Full setup guide.

claude-context is an open-source MCP server by Zilliz that indexes your codebase with vector embeddings and gives Claude Code semantic retrieval over your whole project. Instead of Claude reading entire files to find relevant code, it queries by meaning -- reducing token costs by roughly 40% on large projects. Setup takes under 10 minutes with a free Zilliz Cloud account.

I have been watching Claude Code burn through tokens reading 500-line files to find one function. claude-context hit #1 on GitHub trending on April 23, 2026 -- 871 new stars in a single day, 7,476 total -- because this problem is universal for anyone running Claude Code on a real project. Here is a complete setup walkthrough and an honest take on what you actually get out of it.

What problem does claude-context actually fix in Claude Code?

Claude Code reads entire files by default when searching for context. On large codebases, 60-80% of your token budget gets consumed just navigating to the right location before Claude writes a single line of code. claude-context adds an MCP layer that intercepts those lookups and answers them with semantic vector search, returning only the relevant snippets instead of full files. This is the single most impactful optimization available for large-codebase Claude Code workflows in 2026.

The token reduction numbers are striking. Milvus published benchmarks comparing full-file reads against semantic retrieval: a test code navigation query dropped from 84,193 tokens to 2,699 -- a 96.8% reduction for that specific query. Real-world full-session measurements are more conservative at roughly 40%, but on a $100+/month Claude Code bill, that is meaningful savings. The 40% figure comes from controlled evaluation documented in the GitHub README and corroborated independently by byteiota.

The reason Claude reads entire files is not a bug -- it is the safest strategy when there is no smarter retrieval layer available. The full file guarantees the answer is in there somewhere. claude-context gives it a better option: a ranked list of semantically relevant snippets that covers the same question at a fraction of the token cost.

Community

Stop building alone.

Join the Skool community. Ask questions, share what you're building, and learn from other people actually shipping AI agents.

Join Skool →

How claude-context works: hybrid search and AST-aware chunking

claude-context indexes your codebase using hybrid search -- BM25 sparse retrieval for keyword precision combined with dense vector embeddings for semantic understanding. Both indexes live in Milvus (open-source, self-hosted) or Zilliz Cloud (the fully managed version, with a free serverless tier). When Claude Code needs context, it queries the MCP server and gets back ranked, relevant code snippets instead of entire files. The hybrid approach handles both exact identifier lookups and fuzzy conceptual queries well.

The indexer uses AST-aware chunking rather than splitting code at arbitrary line boundaries. It parses your project into real semantic units -- functions, classes, interface definitions, method bodies -- before generating embeddings. This matters because chunks that correspond to actual code structures produce better embeddings. A chunk that contains a complete function definition gives the model the right unit to reason about. A chunk that starts in the middle of a conditional and ends in the middle of a loop does not.

Incremental indexing handles updates automatically. After the initial index run, only changed files get re-indexed on subsequent sessions. The indexer detects changes via file hashes and skips everything that has not moved. On a 50K-line project, initial indexing takes roughly 3-8 minutes depending on OpenAI embedding API latency. Subsequent incremental runs are near-instant for small changesets -- fast enough to run as a pre-session hook or after a git pull.

The MCP server is compatible with Claude Code, Cursor, VS Code with MCP extensions, and Windsurf -- any client that speaks Model Context Protocol. It surfaces a set of MCP tools that Claude can call to query the index, and Claude Code v2.x picks them up automatically once the server is registered.

Want the templates from this tutorial?

I share every workflow, prompt, and template inside the free AI Creator Hub on Skool. 500+ builders sharing what actually works.

Join Free on Skool

How to set up claude-context in Claude Code (step by step)

Full setup requires Node.js 20 or 22 (Node 24 is explicitly not supported as of April 2026 -- check with node --version before starting), an OpenAI API key for generating embeddings, and a Zilliz Cloud account for the vector database. The free Zilliz Cloud serverless tier is sufficient for most projects.

Step 1: Create a Zilliz Cloud cluster

Go to cloud.zilliz.com and create a free serverless cluster. No credit card required for the free tier. From the cluster dashboard, copy two values: the Public Endpoint URL (this becomes your MILVUS_ADDRESS) and your API Key (this becomes your MILVUS_TOKEN). Keep them handy for the next step.

Step 2: Add the MCP server to Claude Code

Run this command from your terminal:

claude mcp add claude-context \
  -e OPENAI_API_KEY=sk-your-openai-key \
  -e MILVUS_ADDRESS=https://your-cluster.zillizcloud.com \
  -e MILVUS_TOKEN=your-zilliz-api-key \
  -- npx @zilliz/claude-context-mcp@latest

This registers the MCP server in your Claude Code config at ~/.claude/settings.json. The npx invocation pulls the latest published version of the server on each Claude Code startup -- no global install required, no version pinning needed unless you hit a regression.

Step 3: Index your codebase

Navigate to your project root and run the indexer:

cd /path/to/your/project
npx @zilliz/claude-context-mcp@latest index .

The indexer walks your project tree, parses code files with the AST chunker, generates embeddings via OpenAI, and uploads them to your Zilliz Cloud cluster. For a 50K-line project, expect 3-8 minutes on the initial run. The process is resumable -- if it gets interrupted, re-run the same command and it skips already-indexed files.

Step 4: Verify it is working

Start a new Claude Code session in your project directory. Ask something that requires navigating the codebase: "where is the authentication middleware defined?" or "find the function that handles session expiry." If claude-context is active, Claude Code will make MCP tool calls to the claude-context server rather than issuing file reads across your entire src tree. The difference in token consumption is visible in the session summary.

For re-indexing after code changes, run the indexer command again. You can automate this as a post-commit hook by adding it to .git/hooks/post-commit:

#!/bin/sh
npx @zilliz/claude-context-mcp@latest index . &

What claude-context does well and where it falls short

claude-context earns its token savings most clearly on codebases with high surface area -- monorepos with multiple services, projects with complex inheritance chains, or repos where internal naming conventions are inconsistent and grepping for exact text does not reliably surface the right code. Semantic search handles "find code that handles rate limiting" even when the function is named applyThrottle and the word "rate" never appears. That is where grep fails and vector search wins.

For small projects under roughly 5,000 lines of code, the overhead of running the vector database and the indexer likely exceeds the token savings. If your entire codebase fits comfortably in a single Claude Code context load, the benefit is minimal. Same applies to very flat, well-named projects where Claude's default file-reading strategy is already efficient.

The dependency on OpenAI for embeddings is a genuine cost to consider. You need an active OpenAI API key even if you are not using OpenAI for anything else. Embedding generation for initial indexing costs roughly $0.01-0.04 per 50K lines of code using text-embedding-3-small, and incremental updates cost fractions of a cent. Not a budget concern, but it is an external dependency and a separate API credential to manage.

Node 24 compatibility is broken as of the April 2026 release. If you upgraded to Node 24 (which shipped in March 2026), you need to downgrade to Node 22 or use a version manager like nvm to run Node 22 for the indexer. The issue is open on the GitHub repo and has not been resolved at the time of writing.

There is also a community fork by Daniel Bowne (github.com/danielbowne/claude-context) that swaps Zilliz Cloud for LanceDB running locally, eliminating the OpenAI API key dependency and keeping everything on your machine. It is less actively maintained but viable if you want a fully local setup with no external API calls beyond Claude itself.

FAQ

Does claude-context work with tools other than Claude Code?

Yes. claude-context is an MCP server and works with any MCP-compatible client. It has been tested with Claude Code, Cursor, Windsurf, and VS Code with MCP extensions. The configuration format differs by tool -- Cursor uses ~/.cursor/mcp.json, VS Code uses its own MCP configuration file -- but the server command and required environment variables are identical across all of them.

Do I need a paid Zilliz Cloud account to use this?

No. Zilliz Cloud has a free serverless tier with no time expiry that covers most developer projects. The free tier runs on shared infrastructure and has sufficient capacity to index and query typical codebases without hitting limits. Self-hosted Milvus is also fully supported if you want everything local -- a standard docker-compose setup takes about 5 minutes and runs entirely on your machine, with no data leaving your environment beyond the OpenAI embedding API calls.

How does claude-context compare to just using grep or ripgrep in Claude Code?

Grep finds exact text matches and is fast for known identifiers. claude-context finds semantically related code -- it can surface a function named applyThrottle in response to a query about rate limiting even if those words never appear in the code. For precise symbol lookups where you know the name, grep is often faster. For conceptual queries across a large codebase -- "find all the places we handle authentication errors" -- semantic search consistently surfaces more relevant results with fewer false positives.

What happens to the index when I add or change files?

Run the indexer again: npx @zilliz/claude-context-mcp@latest index .. The incremental update detects changed and new files via content hashes and only re-embeds what changed. For a typical development session with a few files modified, this takes under 30 seconds. You can automate it as a post-commit hook or pre-session script so the index is always current when you start a Claude Code session.

Want the templates from this tutorial?

I share every workflow, prompt, and template inside the free AI Creator Hub on Skool. 500+ builders sharing what actually works.

Join Free on Skool
AI Agents First

The daily signal from the frontier of AI agents.

Join builders, founders, and researchers getting the sharpest one-email read on what's actually shipping in AI — every morning.

No spam — unsubscribe anytime