System Architecture¶
cbintel is a modular intelligence gathering and knowledge synthesis platform. This document provides a comprehensive overview of the system architecture and component relationships.
High-Level Architecture¶
graph TB
subgraph "User Interface"
CHAT[Research Agent<br/>Natural Language]
API[REST API<br/>intel.nominate.ai]
CLI[CLI Tools<br/>cbintel-*]
end
subgraph "Orchestration Layer"
INTENT[Intent Classifier]
GRAPH[Graph Executor]
QUEUE[Redis Job Queue]
WORKSPACE[Workspace Manager]
end
subgraph "Worker Pool"
W_CRAWL[CrawlWorker]
W_GRAPH[GraphWorker]
W_BROWSER[BrowserWorker]
W_LAZARUS[LazarusWorker]
W_VECTL[VectlWorker]
W_TRANSCRIPT[TranscriptWorker]
W_SCREENSHOT[ScreenshotWorker]
end
subgraph "Primitives Layer"
AI[AI Clients<br/>Anthropic/Ollama/CBAI]
HTTP[HTTP Client<br/>Fetch/Proxy Support]
BROWSER[Browser Automation<br/>Ferret/Playwright]
ARCHIVE[Archive Retrieval<br/>Lazarus/CDX]
VECTOR[Vector Storage<br/>Vectl/Embeddings]
end
subgraph "Network Layer"
GEOROUTER[GeoRouter<br/>Geographic Routing]
VPN[VPN Banks<br/>16 OpenWRT Workers]
TOR[Tor Gateway<br/>tor.nominate.ai]
HAPROXY[HAProxy<br/>Load Balancing]
end
subgraph "External Services"
ANTHROPIC[Anthropic Claude API]
OLLAMA[Ollama Local LLM]
WAYBACK[Internet Archive]
PROTON[ProtonVPN<br/>12,900+ profiles]
end
CHAT --> INTENT
INTENT --> GRAPH
API --> QUEUE
CLI --> QUEUE
QUEUE --> W_CRAWL & W_GRAPH & W_BROWSER & W_LAZARUS & W_VECTL & W_TRANSCRIPT & W_SCREENSHOT
W_CRAWL --> AI & HTTP
W_GRAPH --> AI & HTTP & VECTOR & BROWSER
W_BROWSER --> BROWSER
W_LAZARUS --> ARCHIVE
W_VECTL --> VECTOR
HTTP --> GEOROUTER
GEOROUTER --> VPN & TOR
VPN --> HAPROXY --> PROTON
AI --> ANTHROPIC & OLLAMA
ARCHIVE --> WAYBACK
GRAPH --> WORKSPACE
Layer Overview¶
1. User Interface Layer¶
Three entry points into the system:
| Interface | Description | Use Case |
|---|---|---|
| Research Agent | Natural language queries | Interactive research sessions |
| REST API | Programmatic job submission | Automated pipelines |
| CLI Tools | Command-line utilities | Development and testing |
2. Orchestration Layer¶
Coordinates work execution and resource management:
| Component | Responsibility |
|---|---|
| Intent Classifier | Converts natural language to graph operations |
| Graph Executor | Runs DAG-based research pipelines |
| Job Queue | Redis-backed async task distribution |
| Workspace Manager | Organizes artifacts and runs |
3. Worker Pool¶
Seven specialized workers process different job types:
| Worker | Job Type | Purpose |
|---|---|---|
CrawlWorker |
crawl |
AI-powered web crawling |
GraphWorker |
graph |
Research graph execution |
BrowserWorker |
browser |
Ferret automation |
LazarusWorker |
lazarus |
Historical archive retrieval |
VectlWorker |
vectl |
Vector embedding/search |
TranscriptWorker |
transcript |
YouTube processing |
ScreenshotWorker |
screenshot |
Browser capture |
4. Primitives Layer¶
Core operations available to all workers:
graph LR
subgraph "AI Clients"
ANTH[AnthropicClient]
OLLAMA[OllamaClient]
CBAI[CBAIClient<br/>Unified]
end
subgraph "Network"
HTTP[HTTPClient]
SEARCH[SearchClient]
URL[URLCleaner]
end
subgraph "Browser"
FERRET[Ferret/SWRM]
PLAYWRIGHT[Playwright]
end
subgraph "Storage"
VECTL[VectorStore]
EMBED[EmbeddingService]
KNOWLEDGE[EntityStore]
end
subgraph "Transform"
HTML[HTMLProcessor]
CHUNK[ChunkingService]
MARKDOWN[MarkdownConverter]
end
5. Network Layer¶
Geographic routing and anonymization:
graph TB
subgraph "Routing Decision"
GEO[GeoRouter]
end
subgraph "VPN Infrastructure"
CLUSTER[Cluster API :9002]
MASTER[Master Router<br/>17.0.0.1]
HAPROXY[HAProxy<br/>Ports 8890-8999]
subgraph "Workers"
W1[Worker 1<br/>17.0.0.10]
W2[Worker 2<br/>17.0.0.11]
WDOTS[...]
W16[Worker 16<br/>17.0.0.25]
end
end
subgraph "Tor Network"
TOR_API[Tor Gateway API<br/>tor.nominate.ai]
TOR_POOL[Tor Worker Pool]
end
GEO -->|VPN Route| CLUSTER
GEO -->|Tor Route| TOR_API
CLUSTER --> MASTER --> HAPROXY
HAPROXY --> W1 & W2 & W16
TOR_API --> TOR_POOL
Data Flow¶
Research Query Flow¶
sequenceDiagram
participant User
participant API as Jobs API
participant Queue as Redis Queue
participant Worker as GraphWorker
participant Graph as GraphExecutor
participant Ops as Operations
participant Net as Network Layer
participant Storage as Workspace
User->>API: POST /api/v1/jobs/graph
API->>Queue: Enqueue job
API-->>User: job_id
Queue->>Worker: Claim job
Worker->>Graph: Execute graph
loop For each stage
Graph->>Ops: Run operations
Ops->>Net: Fetch (via VPN/Tor)
Net-->>Ops: Content
Ops->>Ops: Transform/Extract
Ops-->>Graph: Stage results
end
Graph->>Storage: Save artifacts
Worker->>Queue: Complete job
User->>API: GET /api/v1/jobs/{id}
API-->>User: Job result + artifacts
Content Processing Pipeline¶
flowchart LR
subgraph "Discover"
SEARCH[search<br/>query -> url[]]
ARCHIVE[archive_discover<br/>domain -> url[]]
end
subgraph "Acquire"
FETCH[fetch_batch<br/>url[] -> html[]]
TOR[tor_fetch<br/>url -> html]
SCREENSHOT[screenshot<br/>url -> image]
end
subgraph "Transform"
TEXT[to_text<br/>html -> text]
MARKDOWN[to_markdown<br/>html -> markdown]
CHUNK[chunk<br/>text -> chunk[]]
end
subgraph "Extract"
ENTITIES[entities<br/>text -> entity[]]
EMBED[embed_batch<br/>chunk[] -> vector[]]
OCR[ocr<br/>image -> text]
end
subgraph "Filter"
SEMANTIC[semantic_filter<br/>chunk[] -> chunk[]]
QUALITY[quality_filter<br/>chunk[] -> chunk[]]
end
subgraph "Synthesize"
INTEGRATE[integrate<br/>chunk[] -> text]
REPORT[to_report<br/>content -> markdown]
end
SEARCH --> FETCH
ARCHIVE --> FETCH
FETCH --> TEXT --> CHUNK
FETCH --> MARKDOWN
TOR --> TEXT
SCREENSHOT --> OCR --> CHUNK
CHUNK --> EMBED
CHUNK --> SEMANTIC --> INTEGRATE
TEXT --> ENTITIES
INTEGRATE --> REPORT
Component Interactions¶
Jobs API and Workers¶
graph TB
subgraph "API Layer"
JOBS[Jobs API :9003]
CLUSTER[Cluster API :9002]
end
subgraph "Queue Layer"
REDIS[(Redis)]
end
subgraph "Worker Layer"
W1[Worker 1]
W2[Worker 2]
W3[Worker 3]
end
subgraph "Storage Layer"
FILES[files.nominate.ai]
LOCAL[(Local Storage)]
end
JOBS -->|Enqueue| REDIS
REDIS -->|Dequeue| W1 & W2 & W3
W1 & W2 & W3 -->|Results| FILES
W1 & W2 & W3 -->|Progress| JOBS
W1 & W2 & W3 -->|VPN Control| CLUSTER
Graph Execution¶
graph TB
subgraph "Input"
YAML[GraphDef YAML]
PARAMS[Runtime Parameters]
end
subgraph "Execution"
PARSE[Parse YAML]
VALIDATE[Type Validation]
SCHEDULE[Stage Scheduler]
subgraph "Stage Execution"
SEQ[Sequential Mode]
PAR[Parallel Mode]
FOREACH[ForEach Mode]
end
OPS[Operation Registry]
end
subgraph "Output"
STATE[Execution State]
ARTIFACTS[Workspace Artifacts]
RESULT[GraphResult]
end
YAML --> PARSE --> VALIDATE
PARAMS --> VALIDATE
VALIDATE --> SCHEDULE
SCHEDULE --> SEQ & PAR & FOREACH
SEQ & PAR & FOREACH --> OPS
OPS --> STATE --> ARTIFACTS --> RESULT
Deployment Architecture¶
graph TB
subgraph "Public Internet"
CLIENT[Clients]
end
subgraph "Nginx Reverse Proxy"
NGINX[nginx]
end
subgraph "Application Servers"
JOBS[cbjobs.service<br/>Port 9003]
CLUSTER[cbcluster.service<br/>Port 32203]
end
subgraph "Services"
REDIS[(Redis)]
OLLAMA[Ollama :11434]
end
subgraph "OpenWRT Network"
MASTER[Master :17.0.0.1]
WORKERS[Workers 17.0.0.10-25]
end
subgraph "External APIs"
ANTHROPIC[api.anthropic.com]
TOR[tor.nominate.ai]
FILES[files.nominate.ai]
end
CLIENT --> NGINX
NGINX -->|intel.nominate.ai| JOBS
NGINX -->|network.nominate.ai| CLUSTER
JOBS --> REDIS
JOBS --> OLLAMA
JOBS --> ANTHROPIC
JOBS --> TOR
JOBS --> FILES
JOBS --> CLUSTER
CLUSTER --> MASTER --> WORKERS
Security Boundaries¶
| Boundary | Protection |
|---|---|
| Public API | HTTPS, API keys (planned) |
| Internal services | Network isolation |
| VPN profiles | File permissions, encrypted storage |
| OpenWRT devices | SSH keys, WireGuard tunnels |
| Credentials | Environment variables, not in code |
Performance Considerations¶
| Component | Scaling Strategy |
|---|---|
| Job Queue | Redis cluster, multiple consumers |
| Workers | Horizontal scaling, worker pools |
| VPN Cluster | 16 workers, HAProxy load balancing |
| Tor Gateway | Worker pool, circuit rotation |
| Vector Store | K-means clustering, NumPy/C++ backend |