Skip to content

System Architecture

cbintel is a modular intelligence gathering and knowledge synthesis platform. This document provides a comprehensive overview of the system architecture and component relationships.

High-Level Architecture

graph TB
    subgraph "User Interface"
        CHAT[Research Agent<br/>Natural Language]
        API[REST API<br/>intel.nominate.ai]
        CLI[CLI Tools<br/>cbintel-*]
    end

    subgraph "Orchestration Layer"
        INTENT[Intent Classifier]
        GRAPH[Graph Executor]
        QUEUE[Redis Job Queue]
        WORKSPACE[Workspace Manager]
    end

    subgraph "Worker Pool"
        W_CRAWL[CrawlWorker]
        W_GRAPH[GraphWorker]
        W_BROWSER[BrowserWorker]
        W_LAZARUS[LazarusWorker]
        W_VECTL[VectlWorker]
        W_TRANSCRIPT[TranscriptWorker]
        W_SCREENSHOT[ScreenshotWorker]
    end

    subgraph "Primitives Layer"
        AI[AI Clients<br/>Anthropic/Ollama/CBAI]
        HTTP[HTTP Client<br/>Fetch/Proxy Support]
        BROWSER[Browser Automation<br/>Ferret/Playwright]
        ARCHIVE[Archive Retrieval<br/>Lazarus/CDX]
        VECTOR[Vector Storage<br/>Vectl/Embeddings]
    end

    subgraph "Network Layer"
        GEOROUTER[GeoRouter<br/>Geographic Routing]
        VPN[VPN Banks<br/>16 OpenWRT Workers]
        TOR[Tor Gateway<br/>tor.nominate.ai]
        HAPROXY[HAProxy<br/>Load Balancing]
    end

    subgraph "External Services"
        ANTHROPIC[Anthropic Claude API]
        OLLAMA[Ollama Local LLM]
        WAYBACK[Internet Archive]
        PROTON[ProtonVPN<br/>12,900+ profiles]
    end

    CHAT --> INTENT
    INTENT --> GRAPH
    API --> QUEUE
    CLI --> QUEUE

    QUEUE --> W_CRAWL & W_GRAPH & W_BROWSER & W_LAZARUS & W_VECTL & W_TRANSCRIPT & W_SCREENSHOT

    W_CRAWL --> AI & HTTP
    W_GRAPH --> AI & HTTP & VECTOR & BROWSER
    W_BROWSER --> BROWSER
    W_LAZARUS --> ARCHIVE
    W_VECTL --> VECTOR

    HTTP --> GEOROUTER
    GEOROUTER --> VPN & TOR
    VPN --> HAPROXY --> PROTON

    AI --> ANTHROPIC & OLLAMA
    ARCHIVE --> WAYBACK

    GRAPH --> WORKSPACE

Layer Overview

1. User Interface Layer

Three entry points into the system:

Interface Description Use Case
Research Agent Natural language queries Interactive research sessions
REST API Programmatic job submission Automated pipelines
CLI Tools Command-line utilities Development and testing

2. Orchestration Layer

Coordinates work execution and resource management:

Component Responsibility
Intent Classifier Converts natural language to graph operations
Graph Executor Runs DAG-based research pipelines
Job Queue Redis-backed async task distribution
Workspace Manager Organizes artifacts and runs

3. Worker Pool

Seven specialized workers process different job types:

Worker Job Type Purpose
CrawlWorker crawl AI-powered web crawling
GraphWorker graph Research graph execution
BrowserWorker browser Ferret automation
LazarusWorker lazarus Historical archive retrieval
VectlWorker vectl Vector embedding/search
TranscriptWorker transcript YouTube processing
ScreenshotWorker screenshot Browser capture

4. Primitives Layer

Core operations available to all workers:

graph LR
    subgraph "AI Clients"
        ANTH[AnthropicClient]
        OLLAMA[OllamaClient]
        CBAI[CBAIClient<br/>Unified]
    end

    subgraph "Network"
        HTTP[HTTPClient]
        SEARCH[SearchClient]
        URL[URLCleaner]
    end

    subgraph "Browser"
        FERRET[Ferret/SWRM]
        PLAYWRIGHT[Playwright]
    end

    subgraph "Storage"
        VECTL[VectorStore]
        EMBED[EmbeddingService]
        KNOWLEDGE[EntityStore]
    end

    subgraph "Transform"
        HTML[HTMLProcessor]
        CHUNK[ChunkingService]
        MARKDOWN[MarkdownConverter]
    end

5. Network Layer

Geographic routing and anonymization:

graph TB
    subgraph "Routing Decision"
        GEO[GeoRouter]
    end

    subgraph "VPN Infrastructure"
        CLUSTER[Cluster API :9002]
        MASTER[Master Router<br/>17.0.0.1]
        HAPROXY[HAProxy<br/>Ports 8890-8999]

        subgraph "Workers"
            W1[Worker 1<br/>17.0.0.10]
            W2[Worker 2<br/>17.0.0.11]
            WDOTS[...]
            W16[Worker 16<br/>17.0.0.25]
        end
    end

    subgraph "Tor Network"
        TOR_API[Tor Gateway API<br/>tor.nominate.ai]
        TOR_POOL[Tor Worker Pool]
    end

    GEO -->|VPN Route| CLUSTER
    GEO -->|Tor Route| TOR_API

    CLUSTER --> MASTER --> HAPROXY
    HAPROXY --> W1 & W2 & W16

    TOR_API --> TOR_POOL

Data Flow

Research Query Flow

sequenceDiagram
    participant User
    participant API as Jobs API
    participant Queue as Redis Queue
    participant Worker as GraphWorker
    participant Graph as GraphExecutor
    participant Ops as Operations
    participant Net as Network Layer
    participant Storage as Workspace

    User->>API: POST /api/v1/jobs/graph
    API->>Queue: Enqueue job
    API-->>User: job_id

    Queue->>Worker: Claim job
    Worker->>Graph: Execute graph

    loop For each stage
        Graph->>Ops: Run operations
        Ops->>Net: Fetch (via VPN/Tor)
        Net-->>Ops: Content
        Ops->>Ops: Transform/Extract
        Ops-->>Graph: Stage results
    end

    Graph->>Storage: Save artifacts
    Worker->>Queue: Complete job

    User->>API: GET /api/v1/jobs/{id}
    API-->>User: Job result + artifacts

Content Processing Pipeline

flowchart LR
    subgraph "Discover"
        SEARCH[search<br/>query -> url[]]
        ARCHIVE[archive_discover<br/>domain -> url[]]
    end

    subgraph "Acquire"
        FETCH[fetch_batch<br/>url[] -> html[]]
        TOR[tor_fetch<br/>url -> html]
        SCREENSHOT[screenshot<br/>url -> image]
    end

    subgraph "Transform"
        TEXT[to_text<br/>html -> text]
        MARKDOWN[to_markdown<br/>html -> markdown]
        CHUNK[chunk<br/>text -> chunk[]]
    end

    subgraph "Extract"
        ENTITIES[entities<br/>text -> entity[]]
        EMBED[embed_batch<br/>chunk[] -> vector[]]
        OCR[ocr<br/>image -> text]
    end

    subgraph "Filter"
        SEMANTIC[semantic_filter<br/>chunk[] -> chunk[]]
        QUALITY[quality_filter<br/>chunk[] -> chunk[]]
    end

    subgraph "Synthesize"
        INTEGRATE[integrate<br/>chunk[] -> text]
        REPORT[to_report<br/>content -> markdown]
    end

    SEARCH --> FETCH
    ARCHIVE --> FETCH
    FETCH --> TEXT --> CHUNK
    FETCH --> MARKDOWN
    TOR --> TEXT
    SCREENSHOT --> OCR --> CHUNK
    CHUNK --> EMBED
    CHUNK --> SEMANTIC --> INTEGRATE
    TEXT --> ENTITIES
    INTEGRATE --> REPORT

Component Interactions

Jobs API and Workers

graph TB
    subgraph "API Layer"
        JOBS[Jobs API :9003]
        CLUSTER[Cluster API :9002]
    end

    subgraph "Queue Layer"
        REDIS[(Redis)]
    end

    subgraph "Worker Layer"
        W1[Worker 1]
        W2[Worker 2]
        W3[Worker 3]
    end

    subgraph "Storage Layer"
        FILES[files.nominate.ai]
        LOCAL[(Local Storage)]
    end

    JOBS -->|Enqueue| REDIS
    REDIS -->|Dequeue| W1 & W2 & W3
    W1 & W2 & W3 -->|Results| FILES
    W1 & W2 & W3 -->|Progress| JOBS
    W1 & W2 & W3 -->|VPN Control| CLUSTER

Graph Execution

graph TB
    subgraph "Input"
        YAML[GraphDef YAML]
        PARAMS[Runtime Parameters]
    end

    subgraph "Execution"
        PARSE[Parse YAML]
        VALIDATE[Type Validation]
        SCHEDULE[Stage Scheduler]

        subgraph "Stage Execution"
            SEQ[Sequential Mode]
            PAR[Parallel Mode]
            FOREACH[ForEach Mode]
        end

        OPS[Operation Registry]
    end

    subgraph "Output"
        STATE[Execution State]
        ARTIFACTS[Workspace Artifacts]
        RESULT[GraphResult]
    end

    YAML --> PARSE --> VALIDATE
    PARAMS --> VALIDATE
    VALIDATE --> SCHEDULE
    SCHEDULE --> SEQ & PAR & FOREACH
    SEQ & PAR & FOREACH --> OPS
    OPS --> STATE --> ARTIFACTS --> RESULT

Deployment Architecture

graph TB
    subgraph "Public Internet"
        CLIENT[Clients]
    end

    subgraph "Nginx Reverse Proxy"
        NGINX[nginx]
    end

    subgraph "Application Servers"
        JOBS[cbjobs.service<br/>Port 9003]
        CLUSTER[cbcluster.service<br/>Port 32203]
    end

    subgraph "Services"
        REDIS[(Redis)]
        OLLAMA[Ollama :11434]
    end

    subgraph "OpenWRT Network"
        MASTER[Master :17.0.0.1]
        WORKERS[Workers 17.0.0.10-25]
    end

    subgraph "External APIs"
        ANTHROPIC[api.anthropic.com]
        TOR[tor.nominate.ai]
        FILES[files.nominate.ai]
    end

    CLIENT --> NGINX
    NGINX -->|intel.nominate.ai| JOBS
    NGINX -->|network.nominate.ai| CLUSTER

    JOBS --> REDIS
    JOBS --> OLLAMA
    JOBS --> ANTHROPIC
    JOBS --> TOR
    JOBS --> FILES
    JOBS --> CLUSTER

    CLUSTER --> MASTER --> WORKERS

Security Boundaries

Boundary Protection
Public API HTTPS, API keys (planned)
Internal services Network isolation
VPN profiles File permissions, encrypted storage
OpenWRT devices SSH keys, WireGuard tunnels
Credentials Environment variables, not in code

Performance Considerations

Component Scaling Strategy
Job Queue Redis cluster, multiple consumers
Workers Horizontal scaling, worker pools
VPN Cluster 16 workers, HAProxy load balancing
Tor Gateway Worker pool, circuit rotation
Vector Store K-means clustering, NumPy/C++ backend