Core Concepts

How Vectra stores, indexes, and retrieves data.

Table of contents

Overview

Vectra keeps a simple, portable model: indexes live as folders on disk, but are fully loaded into memory at query time. You choose whether to work at the item level (you supply vectors + metadata) or the document level (Vectra chunks, embeds, and retrieves for you).

Index types

LocalIndex

  • You bring: vectors and metadata.
  • Configure which metadata fields to “index” (kept inline in index.json) vs. store per-item in external JSON.
  • Query by vector with optional metadata filtering; results return items sorted by cosine similarity.

LocalDocumentIndex

  • You bring: raw text (strings, files, or URLs).
  • Vectra splits text into chunks, generates embeddings via your configured provider, stores chunk metadata (documentId, startPos, endPos), and persists the document body to disk.
  • Query by text; results are grouped per document with methods to render scored spans for prompts.

Both are folder-backed and portable — any language can read/write the on-disk format.

For a deep dive on the document workflow — chunking, retrieval, hybrid search, and FolderWatcher — see the Document Indexing guide.

Metadata filtering

Vectra uses a Pinecone-compatible subset of MongoDB-style query operators. Filters are evaluated before similarity ranking.

Supported operators

Category Operators
Logical $and, $or
Comparison $eq, $ne, $gt, $gte, $lt, $lte
Set/string $in, $nin

Indexed vs. non-indexed fields

  Indexed fields Non-indexed fields
Location Inline in index.json Per-item JSON file on disk
Filterable Yes No
Trade-off Speeds filtering but increases index.json size Keeps index.json small

Configure indexed fields at index creation:

await index.createIndex({
  version: 1,
  metadata_config: { indexed: ['category', 'language'] },
});

Only fields listed in metadata_config.indexed can be used in query filters. All other metadata is stored per-item on disk and is not available for filtering.

On-disk layout

Vectra’s on-disk format is language-agnostic — any tool that reads JSON can work with it.

Basic index (LocalIndex)

my-index/
├── index.json          # vectors, norms, indexed metadata
├── <itemId-1>.json     # non-indexed metadata for item 1
├── <itemId-2>.json     # non-indexed metadata for item 2
└── ...

index.json contains:

  • version — format version
  • metadata_config — which fields are indexed
  • items[] — each with id, vector, norm, metadata (indexed fields only), and optional metadataFile

Document index (LocalDocumentIndex)

my-doc-index/
├── index.json          # chunk vectors + chunk metadata
├── catalog.json        # uri ↔ documentId mapping
├── <docId>.txt         # document body
├── <docId>.json        # optional document-level metadata
└── ...

catalog.json maps URIs to document IDs and tracks counts. It’s portable and easy to inspect or version.

Protocol Buffer format

Indexes can optionally use Protocol Buffer serialization (40-50% smaller files). The layout is identical but with .pb extensions:

my-index/
├── index.pb            # binary proto format
├── <itemId-1>.pb
├── catalog.pb          # (document indexes)
├── <docId>.txt         # document bodies unchanged
└── ...

See Storage — Formats for setup and migration details.

Storage backends

Vectra separates index logic from file I/O through the FileStorage interface. This allows the same index code to run against the local filesystem, IndexedDB (browsers), or in-memory storage.

Backend Environment Persistence
LocalFileStorage Node.js Disk
IndexedDBStorage Browser, Electron IndexedDB
VirtualFileStorage Any In-memory

You can also implement FileStorage to store data anywhere (S3, SQLite, etc.). See the Storage guide for the full interface, browser setup, and custom implementation examples.

File-backed vs. in-memory usage

Persistent usage

Choose a stable folder and reuse it across runs:

const index = new LocalIndex('./my-index');

Create the index once, then upsert/insert items or documents incrementally.

Ephemeral usage

Use a temporary directory per run and rebuild from source. Useful for CI, notebooks, or demos:

import os from 'node:os';
import path from 'node:path';
import { LocalIndex } from 'vectra';

const folderPath = path.join(os.tmpdir(), 'vectra-ephemeral');
const index = new LocalIndex(folderPath);
await index.createIndex({
  version: 1,
  deleteIfExists: true, // reset each run
  metadata_config: { indexed: ['category'] },
});

Pass deleteIfExists: true to createIndex to reset the index on each run.