Primitives¶

Primitives are the fundamental building blocks of cbintel. They provide core operations for AI interaction, data retrieval, content transformation, and storage.

Overview¶

graph TB
    subgraph "AI Layer"
        ANTHROPIC[AnthropicClient]
        OLLAMA[OllamaClient]
        CBAI[CBAIClient<br/>Unified]
    end

    subgraph "Network Layer"
        HTTP[HTTPClient]
        SEARCH[SearchClient]
        URL[URLCleaner]
    end

    subgraph "Browser Layer"
        FERRET[Ferret/SWRM]
        PLAYWRIGHT[Playwright]
    end

    subgraph "Archive Layer"
        CDX[CDXClient]
        GAU[URL Discovery]
    end

    subgraph "Storage Layer"
        VECTL[VectorStore]
        EMBED[EmbeddingService]
        ENTITY[EntityStore]
    end

    subgraph "Transform Layer"
        HTML[HTMLProcessor]
        CHUNK[ChunkingService]
        MARKDOWN[MarkdownConverter]
    end

Quick Reference¶

Category	Module	Purpose
Directory Structure	`cbintel.*`	Package layout
AI Clients	`cbintel.ai`	LLM interaction
HTTP/Fetch	`cbintel.net`	Web requests
Browser Automation	`cbintel.ferret`	SWRM browser control
Archive Retrieval	`cbintel.lazarus`	Historical web data
Vector Storage	`cbintel.vectl`	Embeddings and search

Primitive Categories¶

Discovery Primitives¶

Operations for finding content:

Primitive	Input	Output	Description
`search`	query	url[]	Multi-engine web search
`youtube_search`	query	url[]	YouTube video discovery
`archive_discover`	domain	url[]	Historical URL enumeration
`extract_links`	html	url[]	Extract URLs from HTML

Acquisition Primitives¶

Operations for retrieving content:

Primitive	Input	Output	Description
`fetch`	url	html	HTTP content retrieval
`fetch_archive`	url, date	html	Historical content
`tor_fetch`	url	html	Fetch through Tor
`screenshot`	url	image	Browser screenshot
`fetch_transcript`	video_id	text	YouTube transcript

Transform Primitives¶

Operations for processing content:

Primitive	Input	Output	Description
`to_markdown`	html	markdown	HTML to Markdown
`to_text`	html	text	HTML to plain text
`chunk`	text	chunk[]	Text segmentation
`embed`	text	vector	Text embedding
`ocr`	image	text	Image to text

Extract Primitives¶

Operations for extracting structured data:

Primitive	Input	Output	Description
`entities`	text	entity[]	Named entity extraction
`topics`	text	topic[]	Topic extraction
`summarize`	text	text	AI summarization
`sentiment`	text	result	Sentiment analysis

Filter Primitives¶

Operations for filtering content:

Primitive	Input	Output	Description
`semantic_filter`	chunk[], query	chunk[]	Semantic similarity filter
`quality_filter`	chunk[]	chunk[]	Quality score filter
`filter_urls`	url[], pattern	url[]	URL pattern filter

Store Primitives¶

Operations for persistent storage:

Primitive	Input	Output	Description
`store_vector`	vector, metadata	ref	Store embedding
`store_entity`	entity	ref	Store entity
`search_vectors`	query	chunk[]	Vector similarity search

Synthesize Primitives¶

Operations for generating output:

Primitive	Input	Output	Description
`integrate`	chunk[], query	text	Synthesize into summary
`chat`	context, messages	text	AI conversation
`to_report`	content, template	markdown	Report generation

Data Flow Compatibility¶

                    OUTPUT →
                    url[]   html   text   chunk[]  vector  entity  bytes
INPUT ↓            ──────────────────────────────────────────────────────
url[]               -       ✓      -       -        -       -       ✓
                           fetch          screenshot
                           tor_fetch
                           fetch_archive

html                ✓       -      ✓       -        -       -       -
                  extract  -     to_text
                  _links         to_markdown

text                -       -      -       ✓        ✓       ✓       -
                                         chunk    embed   entities

chunk[]             -       -      ✓       ✓        ✓       -       -
                                integrate filter   embed_batch
                                         semantic

bytes               -       -      ✓       -        -       -       -
                                  ocr

Usage Patterns¶

Direct Python Usage¶

from cbintel.ai import CBAIClient
from cbintel.net import HTTPClient
from cbintel.vectl import EmbeddingService, VectorStore

# AI completion
async with CBAIClient() as ai:
    response = await ai.complete("Summarize this text", context=text)

# HTTP fetch
async with HTTPClient() as http:
    response = await http.get("https://example.com")

# Vector storage
embeddings = EmbeddingService()
vectors = await embeddings.embed_batch(chunks)

store = VectorStore("./my-index")
for i, vector in enumerate(vectors):
    await store.add(f"chunk_{i}", vector)

In Graph Operations¶

stages:
  - name: discover
    sequential:
      - op: search
        params:
          query: "AI regulation"
        output: urls

  - name: acquire
    parallel:
      - op: fetch_batch
        input: urls
        output: pages

  - name: transform
    sequential:
      - op: to_text_batch
        input: pages
        output: texts
      - op: chunk
        input: texts
        output: chunks

  - name: process
    parallel:
      - op: embed_batch
        input: chunks
        output: vectors
      - op: entities
        input: texts
        output: entities

  - name: synthesize
    sequential:
      - op: integrate
        input: chunks
        params:
          query: "What are the key points?"
        output: summary

Async Patterns¶

All primitives use async/await:

import asyncio
from cbintel.vectl import SemanticSearch
from cbintel.screenshots import ScreenshotService

async def main():
    search = SemanticSearch("./index")

    async with ScreenshotService() as screenshots:
        # Parallel operations
        results = await asyncio.gather(
            search.search("query"),
            screenshots.screenshot("https://example.com"),
        )

    search_results, capture = results

asyncio.run(main())

Error Handling¶

from cbintel.exceptions import (
    CbintelError,        # Base exception
    NetworkError,        # HTTP/connection errors
    AIError,             # LLM API errors
    StorageError,        # File/database errors
    ConfigurationError,  # Missing config/env vars
)

try:
    result = await client.complete(prompt)
except AIError as e:
    print(f"AI error: {e}")
except NetworkError as e:
    print(f"Network error: {e}")
except CbintelError as e:
    print(f"General error: {e}")

Configuration¶

Primitives are configured via environment variables:

# AI/LLM
ANTHROPIC_API_KEY=sk-ant-...
OLLAMA_BASE_URL=http://127.0.0.1:11434
OLLAMA_EMBED_MODEL=nomic-embed-text

# HTTP
HTTP_TIMEOUT=30.0
HTTP_MAX_RETRIES=3

# Browser
FERRET_SWRM_URL=ws://localhost:8000/ws
FERRET_TIMEOUT=30.0

# Storage
VECTL_INDEX_PATH=./data/vectors