Offline RAG: Building a High-Performance SQLite Vector Index Inside a Web Worker

In modern web applications, integrating AI often translates to sending network requests to heavy cloud APIs. However, when building a real-time, low-latency companion like Interview Co-Pilot, relying solely on cloud connections introduces critical failure points: high network latency, potential dropouts, and heavy server bills. The solution? Offline-First RAG (Retrieval-Augmented Generation) operating entirely client-side.

But running vector embeddings and similarity queries on a mobile browser is a CPU nightmare. If done on the main thread, the UI freezes, audio capture stutters, and the user experience degrades instantly. In this post, we’ll explore how to design a high-performance, background-threaded offline RAG system using Web Workers, SQLite WASM, and efficient message-passing protocols.

The Architecture: Keeping the Main Thread at 60 FPS

To keep voice capture and UI animations smooth, we must follow the golden rule of browser engineering: never block the main thread. Our system splits responsibilities across distinct execution contexts:

Main Thread (React/Next.js): Orchestrates touch gestures, captures audio streams, and renders transcription feedback.
Database Web Worker (db.worker.ts): Runs an instance of SQLite in-memory/indexedDB, parses uploaded custom JSON question banks, and builds a queryable matrix.
Embedding Web Worker (embedding.worker.ts): Vectorizes input questions and computes similarity scores using lightweight, client-side vector search.

Step 1: Setting Up the Background DB Web Worker

By moving SQLite operations to a separate Web Worker, we prevent long-running database insertions and queries from stutters in audio stream capturing. Here is a simplified implementation of a database worker handling the compilation and loading of questions:

// db.worker.ts
import initSqlJs from 'sql.js';

let db: any = null;

self.onmessage = async (event) => {
  const { type, payload } = event.data;

  if (type === 'INIT_DB') {
    const SQL = await initSqlJs({ locateFile: file => `/sql-wasm.wasm` });
    db = new SQL.Database();
    
    // Create highly optimized indexes for fast lookups
    db.run(`
      CREATE TABLE IF NOT EXISTS qa_library (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        question TEXT NOT NULL,
        answer TEXT NOT NULL,
        embedding_json TEXT
      );
      CREATE INDEX IF NOT EXISTS idx_question ON qa_library(question);
    `);
    
    self.postMessage({ type: 'DB_READY' });
  }

  if (type === 'LOAD_LIBRARY') {
    const { items } = payload;
    db.run("BEGIN TRANSACTION;");
    const stmt = db.prepare("INSERT INTO qa_library (question, answer, embedding_json) VALUES (?, ?, ?);");
    
    for (const item of items) {
      stmt.run([item.question, item.answer, JSON.stringify(item.embedding)]);
    }
    
    stmt.free();
    db.run("COMMIT;");
    self.postMessage({ type: 'LIBRARY_LOADED', count: items.length });
  }
};

Step 2: Vector Search & Mathematical Distance

Once our data is indexed, the next step is calculating the similarity score. In the offline worker, we calculate the cosine similarity between the query embedding and the pre-computed question embeddings stored in SQLite:

// Vector similarity calculation
function cosineSimilarity(vecA: number[], vecB: number[]) {
  let dotProduct = 0.0;
  let normA = 0.0;
  let normB = 0.0;
  for (let i = 0; i < vecA.length; i++) {
    dotProduct += vecA[i] * vecB[i];
    normA += vecA[i] * vecA[i];
    normB += vecB[i] * vecB[i];
  }
  return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}

Step 3: Merging Matrices Client-Side

In a typical corporate application, database tables are locked. In an offline-first PWA, candidates must be able to load multiple, concurrent JSON files (e.g., separate React, system design, and database question pools) simultaneously. Our UI parses and merges these libraries concurrently, avoiding database locking issues:

// libraryManager.ts
export async function mergeAndUploadLibraries(files: FileList) {
  const parsePromises = Array.from(files).map(file => {
    return new Promise((resolve) => {
      const reader = new FileReader();
      reader.onload = (e) => resolve(JSON.parse(e.target?.result as string));
      reader.readAsText(file);
    });
  });

  const parsedLibraries = await Promise.all(parsePromises);
  const mergedItems = parsedLibraries.flat(); // Client-side matrix merge

  // Send merged items to SQLite Web Worker
  dbWorker.postMessage({ type: 'LOAD_LIBRARY', payload: { items: mergedItems } });
}

Key Performance Takeaways

WASM Pre-caching: Serve the sql-wasm.wasm binary using custom PWA service workers (like Serwist) to ensure instantaneous offline startup.
Background Vector Math: Offloading high-dimensional similarity arrays to web workers reduces main thread blockage to absolutely zero milliseconds.
SQLite Transactions: Always wrap multiple JSON card insertion tasks inside a BEGIN TRANSACTION and COMMIT block to speed up client-side processing by up to 50x.

By shifting database structures and vector processing to background threads, we pave the way for high-fidelity, real-time AI voice companions running completely local, serverless, and offline.