Skip to Content
DocumentationSelf-HostedArchitecture Overview

Architecture Overview

QuantumVerifi is designed as a set of stateless services backed by managed datastores. This architecture supports horizontal scaling, zero-downtime deployments, and multi-region operation.

Component Diagram

┌──────────────────────┐ │ Load Balancer │ │ (Ingress / ALB) │ └──────┬───────┬───────┘ │ │ ┌────────────┘ └────────────┐ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Web Frontend │ │ API Server │ │ (Next.js) │ │ (Go / Gin) │ │ Port 3000 │ │ Port 8080 │ └─────────────────┘ └────────┬─────────┘ ┌────────────────┼────────────────┐ ▼ ▼ ▼ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Worker │ │ PostgreSQL │ │ Redis │ │ (Temporal) │ │ │ │ │ └──────┬──────┘ └──────────────┘ └──────────────┘ ┌────────────┼────────────┐ ▼ ▼ ▼ ┌──────────────┐ ┌──────────┐ ┌──────────────┐ │ LLM Gateway │ │ Sandbox │ │ Object │ │ (LiteLLM) │ │ Pods │ │ Storage │ └──────────────┘ └──────────┘ └──────────────┘

Service Details

API Server

The API server handles all HTTP requests — analysis creation, result retrieval, billing, authentication, and SSE progress streaming.

  • Stateless — all state is in PostgreSQL and Redis
  • Horizontally scalable — run 2+ replicas behind a load balancer
  • Authentication — JWT-based with configurable identity provider

Worker

Workers execute analysis workflows using durable orchestration. Each analysis is a multi-step workflow that survives infrastructure restarts.

  • Auto-scaled — scale based on queue depth
  • Heartbeat monitoring — long-running analyses send periodic heartbeats
  • Retry logic — failed steps are retried with exponential backoff

Web Frontend

The Next.js frontend provides the dashboard, analysis viewer, settings, and billing UI.

  • Server-side rendering — fast initial page loads
  • Real-time updates — SSE streaming for analysis progress

LLM Gateway

Routes LLM requests to configured providers with automatic failover:

  • Primary — Azure OpenAI (GPT-4.1 family)
  • Secondary — Anthropic (Claude Sonnet / Haiku)
  • Tertiary — Ollama (self-hosted, for air-gapped deployments)

Rate limiting, response caching, and cost tracking are handled at this layer.

Sandbox Execution

Tests run in ephemeral containers with per-language runtimes:

RuntimeLanguagesPre-installed tools
Node.js 20JavaScript, TypeScriptJest, Vitest, Playwright
Python 3.12PythonPytest, coverage
Go 1.25Gogo test
Java 21Java, KotlinMaven, JUnit
Rust 1.83Rustcargo test
UniversalMulti-languageNode + Python + Go

Sandboxes are created on demand, run for the duration of the test, and are destroyed immediately after — no persistent state, no shared resources.

Data Flow

  1. User submits analysis via API or dashboard
  2. API server creates a workflow and enqueues it
  3. Worker picks up the workflow, clones the repo
  4. Worker calls LLM Gateway for test generation
  5. Generated tests are sent to a sandbox for execution
  6. Results are stored in PostgreSQL + Object Storage
  7. Evidence chain events are hashed and linked
  8. SSE events stream progress back to the frontend

Scaling Guidelines

ComponentScaling triggerRecommended
API ServerRequest rate2-4 replicas
WorkerQueue depth2-8 replicas (auto-scaled)
Web FrontendTraffic2-3 replicas
PostgreSQLConnection countManaged service with read replicas
RedisMemory usageManaged service
Object StorageStorage volumeManaged S3-compatible service

Network Requirements

SourceDestinationPortPurpose
WebAPI Server8080API requests
API ServerPostgreSQL5432Data storage
API ServerRedis6379Queues and cache
WorkerLLM Gateway4000AI generation
WorkerSandboxDynamicTest execution
LLM GatewayLLM Provider443LLM API calls
SandboxObject Storage9000Artifact upload