Skip to content

Primitives

Primitives are the fundamental building blocks of cbintel. They provide core operations for AI interaction, data retrieval, content transformation, and storage.

Overview

graph TB
    subgraph "AI Layer"
        ANTHROPIC[AnthropicClient]
        OLLAMA[OllamaClient]
        CBAI[CBAIClient<br/>Unified]
    end

    subgraph "Network Layer"
        HTTP[HTTPClient]
        SEARCH[SearchClient]
        URL[URLCleaner]
    end

    subgraph "Browser Layer"
        FERRET[Ferret/SWRM]
        PLAYWRIGHT[Playwright]
    end

    subgraph "Archive Layer"
        CDX[CDXClient]
        GAU[URL Discovery]
    end

    subgraph "Storage Layer"
        VECTL[VectorStore]
        EMBED[EmbeddingService]
        ENTITY[EntityStore]
    end

    subgraph "Transform Layer"
        HTML[HTMLProcessor]
        CHUNK[ChunkingService]
        MARKDOWN[MarkdownConverter]
    end

Quick Reference

Category Module Purpose
Directory Structure cbintel.* Package layout
AI Clients cbintel.ai LLM interaction
HTTP/Fetch cbintel.net Web requests
Browser Automation cbintel.ferret SWRM browser control
Archive Retrieval cbintel.lazarus Historical web data
Vector Storage cbintel.vectl Embeddings and search

Primitive Categories

Discovery Primitives

Operations for finding content:

Primitive Input Output Description
search query url[] Multi-engine web search
youtube_search query url[] YouTube video discovery
archive_discover domain url[] Historical URL enumeration
extract_links html url[] Extract URLs from HTML

Acquisition Primitives

Operations for retrieving content:

Primitive Input Output Description
fetch url html HTTP content retrieval
fetch_archive url, date html Historical content
tor_fetch url html Fetch through Tor
screenshot url image Browser screenshot
fetch_transcript video_id text YouTube transcript

Transform Primitives

Operations for processing content:

Primitive Input Output Description
to_markdown html markdown HTML to Markdown
to_text html text HTML to plain text
chunk text chunk[] Text segmentation
embed text vector Text embedding
ocr image text Image to text

Extract Primitives

Operations for extracting structured data:

Primitive Input Output Description
entities text entity[] Named entity extraction
topics text topic[] Topic extraction
summarize text text AI summarization
sentiment text result Sentiment analysis

Filter Primitives

Operations for filtering content:

Primitive Input Output Description
semantic_filter chunk[], query chunk[] Semantic similarity filter
quality_filter chunk[] chunk[] Quality score filter
filter_urls url[], pattern url[] URL pattern filter

Store Primitives

Operations for persistent storage:

Primitive Input Output Description
store_vector vector, metadata ref Store embedding
store_entity entity ref Store entity
search_vectors query chunk[] Vector similarity search

Synthesize Primitives

Operations for generating output:

Primitive Input Output Description
integrate chunk[], query text Synthesize into summary
chat context, messages text AI conversation
to_report content, template markdown Report generation

Data Flow Compatibility

                    OUTPUT →
                    url[]   html   text   chunk[]  vector  entity  bytes
INPUT ↓            ──────────────────────────────────────────────────────
url[]               -       ✓      -       -        -       -       ✓
                           fetch          screenshot
                           tor_fetch
                           fetch_archive

html                ✓       -      ✓       -        -       -       -
                  extract  -     to_text
                  _links         to_markdown

text                -       -      -       ✓        ✓       ✓       -
                                         chunk    embed   entities

chunk[]             -       -      ✓       ✓        ✓       -       -
                                integrate filter   embed_batch
                                         semantic

bytes               -       -      ✓       -        -       -       -
                                  ocr

Usage Patterns

Direct Python Usage

from cbintel.ai import CBAIClient
from cbintel.net import HTTPClient
from cbintel.vectl import EmbeddingService, VectorStore

# AI completion
async with CBAIClient() as ai:
    response = await ai.complete("Summarize this text", context=text)

# HTTP fetch
async with HTTPClient() as http:
    response = await http.get("https://example.com")

# Vector storage
embeddings = EmbeddingService()
vectors = await embeddings.embed_batch(chunks)

store = VectorStore("./my-index")
for i, vector in enumerate(vectors):
    await store.add(f"chunk_{i}", vector)

In Graph Operations

stages:
  - name: discover
    sequential:
      - op: search
        params:
          query: "AI regulation"
        output: urls

  - name: acquire
    parallel:
      - op: fetch_batch
        input: urls
        output: pages

  - name: transform
    sequential:
      - op: to_text_batch
        input: pages
        output: texts
      - op: chunk
        input: texts
        output: chunks

  - name: process
    parallel:
      - op: embed_batch
        input: chunks
        output: vectors
      - op: entities
        input: texts
        output: entities

  - name: synthesize
    sequential:
      - op: integrate
        input: chunks
        params:
          query: "What are the key points?"
        output: summary

Async Patterns

All primitives use async/await:

import asyncio
from cbintel.vectl import SemanticSearch
from cbintel.screenshots import ScreenshotService

async def main():
    search = SemanticSearch("./index")

    async with ScreenshotService() as screenshots:
        # Parallel operations
        results = await asyncio.gather(
            search.search("query"),
            screenshots.screenshot("https://example.com"),
        )

    search_results, capture = results

asyncio.run(main())

Error Handling

from cbintel.exceptions import (
    CbintelError,        # Base exception
    NetworkError,        # HTTP/connection errors
    AIError,             # LLM API errors
    StorageError,        # File/database errors
    ConfigurationError,  # Missing config/env vars
)

try:
    result = await client.complete(prompt)
except AIError as e:
    print(f"AI error: {e}")
except NetworkError as e:
    print(f"Network error: {e}")
except CbintelError as e:
    print(f"General error: {e}")

Configuration

Primitives are configured via environment variables:

# AI/LLM
ANTHROPIC_API_KEY=sk-ant-...
OLLAMA_BASE_URL=http://127.0.0.1:11434
OLLAMA_EMBED_MODEL=nomic-embed-text

# HTTP
HTTP_TIMEOUT=30.0
HTTP_MAX_RETRIES=3

# Browser
FERRET_SWRM_URL=ws://localhost:8000/ws
FERRET_TIMEOUT=30.0

# Storage
VECTL_INDEX_PATH=./data/vectors