Docusaurus Vecto Search
Welcome to the Docusaurus Vecto Search documentation! This plugin provides Vecto-powered search for your Docusaurus website, with support for BM25 keyword search, Vecto.ai vector search, and hybrid mode that combines both using Reciprocal Rank Fusion.

Setup
Ensure that you have a Docusaurus v3 project ready. You may also generate a fresh one by:
- npm
- Yarn
npx create-docusaurus@latest my-website classic
yarn create docusaurus my-website classic
Vecto is optional — the plugin works in BM25-only mode without an account. If you want vector or hybrid search, request a Vecto token here.
1) Install Docusaurus Vecto Search Plugin
Navigate to the root of your Docusaurus project, then install via
- npm
- Yarn
npm install @xpressai/docusaurus-vecto-search
yarn add @xpressai/docusaurus-vecto-search
2) Update Docusaurus Configuration
In your docusaurus.config.js file, add the plugin to themes and configure it via themeConfig:
// docusaurus.config.js
module.exports = {
themes: ['@xpressai/docusaurus-vecto-search'],
themeConfig: {
vectorSearch: {
mode: 'hybrid', // "bm25" | "vector" | "hybrid"
vecto: {
publicToken: process.env.VECTO_PUBLIC_TOKEN ?? '',
vectorSpaceId: Number(process.env.VECTO_SPACE_ID ?? '0'),
},
},
},
};
For BM25-only mode (no Vecto account needed), simply use:
themeConfig: {
vectorSearch: {
mode: 'bm25',
},
},
For the full list of configs, refer to the configuration section.
3) Add Vecto User Token To Environment Variables
If you're using vector or hybrid mode, you'll need to set the VECTO_USER_TOKEN environment variable for the plugin to ingest content into Vecto during builds. This token is private and is not exposed in the client bundle. (Skip this step if you're using bm25 mode.)
a. For CI/CD (e.g., GitHub Actions)
If you are deploying your Docusaurus site using a CI/CD service like GitHub Actions, set VECTO_USER_TOKEN as an environment variable in your workflow configuration. You can use repository secrets to securely store the token.
- name: Build
env:
VECTO_USER_TOKEN: ${{ secrets.VECTO_USER_TOKEN }}
run: yarn build
b. For Local Development
For local development, you can export the VECTO_USER_TOKEN from your terminal:
export VECTO_USER_TOKEN=your_token_value_here
Alternatively, you can create a .env file in the root of your Docusaurus project and add the token there:
VECTO_USER_TOKEN=your_token_value_here
Using a .env file ensures that the token remains set between terminal sessions.
4) Build!
Finally, build your Docusaurus website with the new search configuration:
- npm
- Yarn
npm run build
yarn build
That's it! Your Docusaurus website should now be set up with the docusaurus-vecto-search functionality.
Preview
We have also implemented the search in this documentation site and at Xircuits.io!
Configuration Options
All configuration lives in themeConfig.vectorSearch. Every option has sensible defaults — you only need to set what you want to change.
| Option | Type | Default | Description |
|---|---|---|---|
mode | "bm25" | "vector" | "hybrid" | "hybrid" | Search mode |
vecto.publicToken | string | "" | The public token for Vecto search (read-only, safe to expose) |
vecto.vectorSpaceId | number | null | The ID of the vector space |
vecto.clearOnBuild | boolean | true | Clear the vector space before re-indexing |
vecto.batchSize | number | 10 | Documents per ingest batch |
maxResults | number | 10 | Max results returned per search |
bm25.k1 | number | 1.5 | BM25 term frequency saturation |
bm25.b | number | 0.75 | BM25 document length normalization |
rrf.k | number | 60 | RRF fusion constant (for hybrid mode) |
hotkey | string | "mod+k" | Keyboard shortcut to focus search |
placeholder | string | "Search docs..." | Input placeholder text |
content.chunkSize | number | 500 | Max words per chunk before the word-window splitter slices a long section |
content.chunkOverlap | number | 50 | Words shared between consecutive word-window slices |
content.splitOnHeadings | [number, number] | [2, 4] | Inclusive range of heading levels that start a new chunk (see below) |
Content chunking
Each source markdown page is converted into one or more chunks before being injected into BM25 and Vecto. A chunk is the smallest unit that can appear in a search result — its heading, title, and text all come from one chunk. A chunk's text field starts with a breadcrumb (the chain of ancestor headings from the page title down to the chunk's own heading, rendered as markdown) followed by the section body with its original markdown structure intact. The breadcrumb gives every chunk its full hierarchical context, so the ranker and any LLM reading the chunk as retrieval context can tell a leaf heading like "Overview" apart from an identically-named leaf elsewhere on the page. MDX-specific noise — import/export lines, JSX/HTML tags, and JSX expression braces — is stripped; headings, emphasis, lists, blockquotes, and code blocks are preserved.
Chunking runs in two passes:
- Heading split — the page is broken at every heading whose level falls inside
content.splitOnHeadings.[min, max]is inclusive on both ends, where1is#(H1),2is##(H2), and so on up to6. The default[2, 4]splits on##,###, and####. Headings outside the range are not boundaries — their full heading line and body flow into the enclosing chunk. - Word-window split — any section longer than
content.chunkSizewords is further sliced into overlapping windows ofchunkSizewords withchunkOverlapwords of overlap. Sections shorter thanchunkSizebecome a single chunk.
For a side-by-side walkthrough of how a real page turns into chunks under different splitOnHeadings values, see Chunking example.
Picking a range
- Wider range (e.g.
[2, 6]) → finer chunks, more specificheadingmetadata per result, better pinpointing of short sub-points. Tradeoff: some chunks can be very short and lose surrounding context. - Narrower range (e.g.
[2, 2]) → coarser chunks that keep related subsections together. Better for "what does this whole feature do" queries, worse for locating a specific subsection. [1, 6]rarely helps in Docusaurus because the page title comes from frontmatter, not an inline#— so there's no H1 in the body to split on.- Regardless of range,
chunkSize/chunkOverlapstill slice any section that exceeds the word limit, so very long sections never become unboundedly large.
vectorSearch: {
content: {
chunkSize: 500,
chunkOverlap: 50,
splitOnHeadings: [2, 3], // split on ## and ###, ignore #### and deeper
},
}
Weighted Score Fusion (alternative to RRF)
You can use weighted score normalization instead of the default Reciprocal Rank Fusion for hybrid mode:
vectorSearch: {
mode: 'hybrid',
weights: { vector: 0.7, bm25: 0.3 },
}
Local Plugin Development
If you would like to modify the current Vecto Search plugin, here are the steps:
-
Clone and install the repository:
git clone https://github.com/XpressAI/docusaurus-vecto-searchcd docusaurus-vecto-searchyarn install -
Build the plugin:
yarn build -
Create a symbolic link for the project:
yarn link -
In a different directory, create a new Docusaurus v3 website or use an existing one:
yarn create docusaurus my-website -
Move into the Docusaurus project directory and link the plugin:
cd my-websiteyarn installyarn link @xpressai/docusaurus-vecto-search -
Build the Docusaurus project:
yarn build
Special Thanks
Originally forked from Docusaurus Search Local. The current version is a full rewrite adding BM25, Vecto vector search, and hybrid fusion modes.