Primitives¶
Primitives are the fundamental building blocks of cbintel. They provide core operations for AI interaction, data retrieval, content transformation, and storage.
Overview¶
graph TB
subgraph "AI Layer"
ANTHROPIC[AnthropicClient]
OLLAMA[OllamaClient]
CBAI[CBAIClient<br/>Unified]
end
subgraph "Network Layer"
HTTP[HTTPClient]
SEARCH[SearchClient]
URL[URLCleaner]
end
subgraph "Browser Layer"
FERRET[Ferret/SWRM]
PLAYWRIGHT[Playwright]
end
subgraph "Archive Layer"
CDX[CDXClient]
GAU[URL Discovery]
end
subgraph "Storage Layer"
VECTL[VectorStore]
EMBED[EmbeddingService]
ENTITY[EntityStore]
end
subgraph "Transform Layer"
HTML[HTMLProcessor]
CHUNK[ChunkingService]
MARKDOWN[MarkdownConverter]
end
Quick Reference¶
| Category | Module | Purpose |
|---|---|---|
| Directory Structure | cbintel.* |
Package layout |
| AI Clients | cbintel.ai |
LLM interaction |
| HTTP/Fetch | cbintel.net |
Web requests |
| Browser Automation | cbintel.ferret |
SWRM browser control |
| Archive Retrieval | cbintel.lazarus |
Historical web data |
| Vector Storage | cbintel.vectl |
Embeddings and search |
Primitive Categories¶
Discovery Primitives¶
Operations for finding content:
| Primitive | Input | Output | Description |
|---|---|---|---|
search |
query | url[] | Multi-engine web search |
youtube_search |
query | url[] | YouTube video discovery |
archive_discover |
domain | url[] | Historical URL enumeration |
extract_links |
html | url[] | Extract URLs from HTML |
Acquisition Primitives¶
Operations for retrieving content:
| Primitive | Input | Output | Description |
|---|---|---|---|
fetch |
url | html | HTTP content retrieval |
fetch_archive |
url, date | html | Historical content |
tor_fetch |
url | html | Fetch through Tor |
screenshot |
url | image | Browser screenshot |
fetch_transcript |
video_id | text | YouTube transcript |
Transform Primitives¶
Operations for processing content:
| Primitive | Input | Output | Description |
|---|---|---|---|
to_markdown |
html | markdown | HTML to Markdown |
to_text |
html | text | HTML to plain text |
chunk |
text | chunk[] | Text segmentation |
embed |
text | vector | Text embedding |
ocr |
image | text | Image to text |
Extract Primitives¶
Operations for extracting structured data:
| Primitive | Input | Output | Description |
|---|---|---|---|
entities |
text | entity[] | Named entity extraction |
topics |
text | topic[] | Topic extraction |
summarize |
text | text | AI summarization |
sentiment |
text | result | Sentiment analysis |
Filter Primitives¶
Operations for filtering content:
| Primitive | Input | Output | Description |
|---|---|---|---|
semantic_filter |
chunk[], query | chunk[] | Semantic similarity filter |
quality_filter |
chunk[] | chunk[] | Quality score filter |
filter_urls |
url[], pattern | url[] | URL pattern filter |
Store Primitives¶
Operations for persistent storage:
| Primitive | Input | Output | Description |
|---|---|---|---|
store_vector |
vector, metadata | ref | Store embedding |
store_entity |
entity | ref | Store entity |
search_vectors |
query | chunk[] | Vector similarity search |
Synthesize Primitives¶
Operations for generating output:
| Primitive | Input | Output | Description |
|---|---|---|---|
integrate |
chunk[], query | text | Synthesize into summary |
chat |
context, messages | text | AI conversation |
to_report |
content, template | markdown | Report generation |
Data Flow Compatibility¶
OUTPUT →
url[] html text chunk[] vector entity bytes
INPUT ↓ ──────────────────────────────────────────────────────
url[] - ✓ - - - - ✓
fetch screenshot
tor_fetch
fetch_archive
html ✓ - ✓ - - - -
extract - to_text
_links to_markdown
text - - - ✓ ✓ ✓ -
chunk embed entities
chunk[] - - ✓ ✓ ✓ - -
integrate filter embed_batch
semantic
bytes - - ✓ - - - -
ocr
Usage Patterns¶
Direct Python Usage¶
from cbintel.ai import CBAIClient
from cbintel.net import HTTPClient
from cbintel.vectl import EmbeddingService, VectorStore
# AI completion
async with CBAIClient() as ai:
response = await ai.complete("Summarize this text", context=text)
# HTTP fetch
async with HTTPClient() as http:
response = await http.get("https://example.com")
# Vector storage
embeddings = EmbeddingService()
vectors = await embeddings.embed_batch(chunks)
store = VectorStore("./my-index")
for i, vector in enumerate(vectors):
await store.add(f"chunk_{i}", vector)
In Graph Operations¶
stages:
- name: discover
sequential:
- op: search
params:
query: "AI regulation"
output: urls
- name: acquire
parallel:
- op: fetch_batch
input: urls
output: pages
- name: transform
sequential:
- op: to_text_batch
input: pages
output: texts
- op: chunk
input: texts
output: chunks
- name: process
parallel:
- op: embed_batch
input: chunks
output: vectors
- op: entities
input: texts
output: entities
- name: synthesize
sequential:
- op: integrate
input: chunks
params:
query: "What are the key points?"
output: summary
Async Patterns¶
All primitives use async/await:
import asyncio
from cbintel.vectl import SemanticSearch
from cbintel.screenshots import ScreenshotService
async def main():
search = SemanticSearch("./index")
async with ScreenshotService() as screenshots:
# Parallel operations
results = await asyncio.gather(
search.search("query"),
screenshots.screenshot("https://example.com"),
)
search_results, capture = results
asyncio.run(main())
Error Handling¶
from cbintel.exceptions import (
CbintelError, # Base exception
NetworkError, # HTTP/connection errors
AIError, # LLM API errors
StorageError, # File/database errors
ConfigurationError, # Missing config/env vars
)
try:
result = await client.complete(prompt)
except AIError as e:
print(f"AI error: {e}")
except NetworkError as e:
print(f"Network error: {e}")
except CbintelError as e:
print(f"General error: {e}")
Configuration¶
Primitives are configured via environment variables: