AI Clients¶

cbintel provides three AI client wrappers for interacting with language models: AnthropicClient for Claude, OllamaClient for local models, and CBAIClient as a unified interface.

Module Structure¶

src/cbintel/ai/
├── __init__.py          # Public exports
├── anthropic_client.py  # Claude API wrapper
├── ollama_client.py     # Local LLM wrapper
├── cbai_client.py       # Unified client
├── embeddings.py        # Embedding generation
├── diff.py              # Content diff utilities
└── sentiment.py         # Sentiment analysis

CBAIClient (Unified)¶

The recommended client for most use cases. Automatically routes to the appropriate backend.

Basic Usage¶

from cbintel.ai import CBAIClient

async with CBAIClient() as ai:
    # Simple completion
    response = await ai.complete("What is machine learning?")
    print(response)

    # With context
    response = await ai.complete(
        "Summarize this document",
        context=document_text
    )

    # Specify model
    response = await ai.complete(
        "Analyze this code",
        model="claude-3-5-sonnet",  # or "llama3.2" for Ollama
    )

Model Selection¶

# Claude models (via Anthropic)
await ai.complete(prompt, model="claude-3-5-sonnet")
await ai.complete(prompt, model="claude-3-opus")
await ai.complete(prompt, model="claude-3-haiku")

# Local models (via Ollama)
await ai.complete(prompt, model="llama3.2")
await ai.complete(prompt, model="mistral")
await ai.complete(prompt, model="codellama")

Configuration¶

from cbintel.ai import CBAIClient

client = CBAIClient(
    default_model="claude-3-5-sonnet",
    temperature=0.7,
    max_tokens=4096,
)

AnthropicClient¶

Direct wrapper for the Anthropic Claude API.

Basic Usage¶

from cbintel.ai import AnthropicClient

client = AnthropicClient(model="claude-3-5-sonnet-20241022")

# Simple completion
response = await client.complete("Explain quantum computing")

# With system prompt
response = await client.complete(
    "Analyze this data",
    system="You are a data analyst. Be concise and precise.",
    context=data
)

# With conversation history
response = await client.chat([
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Hi! How can I help?"},
    {"role": "user", "content": "Explain AI"},
])

Streaming¶

async for chunk in client.stream("Write a story about robots"):
    print(chunk, end="", flush=True)

Configuration¶

client = AnthropicClient(
    model="claude-3-5-sonnet-20241022",
    api_key="sk-ant-...",  # Or set ANTHROPIC_API_KEY
    max_tokens=4096,
    temperature=0.7,
)

Available Models¶

Model	ID	Best For
Claude 3.5 Sonnet	`claude-3-5-sonnet-20241022`	General use
Claude 3 Opus	`claude-3-opus-20240229`	Complex reasoning
Claude 3 Haiku	`claude-3-haiku-20240307`	Fast, simple tasks

OllamaClient¶

Wrapper for local LLM inference via Ollama.

Basic Usage¶

from cbintel.ai import OllamaClient

client = OllamaClient(
    model="llama3.2",
    base_url="http://127.0.0.1:11434"
)

# Simple completion
response = await client.complete("What is Python?")

# With context
response = await client.complete(
    "Summarize this",
    context=text
)

Streaming¶

async for chunk in client.stream("Write a haiku"):
    print(chunk, end="", flush=True)

Configuration¶

client = OllamaClient(
    model="llama3.2",
    base_url="http://127.0.0.1:11434",  # Or set OLLAMA_BASE_URL
    timeout=120.0,
)

Available Models¶

Common models for Ollama:

Model	Size	Best For
`llama3.2`	3B/8B	General use
`mistral`	7B	Fast, quality
`codellama`	7B/13B	Code tasks
`nomic-embed-text`	137M	Embeddings
`translategemma`	4B/12B/27B	Translation

Embeddings¶

Generate text embeddings for semantic search.

Using EmbeddingService¶

from cbintel.vectl import EmbeddingService

service = EmbeddingService(
    model="nomic-embed-text",
    base_url="http://127.0.0.1:11434"
)

# Single embedding
vector = await service.embed("Hello, world!")
print(f"Dimensions: {len(vector)}")  # 768

# Batch embeddings
vectors = await service.embed_batch([
    "First document",
    "Second document",
    "Third document",
])

Common Embedding Models¶

Model	Dimensions	Speed
`nomic-embed-text`	768	Fast
`all-MiniLM-L6-v2`	384	Very fast
`mxbai-embed-large`	1024	Slower, higher quality

Sentiment Analysis¶

from cbintel.ai import sentiment

result = await sentiment("This product is amazing!")
print(f"Sentiment: {result.label}")  # positive
print(f"Score: {result.score}")       # 0.95

Content Diff¶

Compare content versions with AI analysis:

from cbintel.ai import diff, diff_summary, has_significant_changes

# Get structured diff
result = await diff(old_text, new_text)
print(f"Additions: {result.additions}")
print(f"Deletions: {result.deletions}")

# Get human-readable summary
summary = await diff_summary(old_text, new_text)
print(summary)

# Check for significant changes
if await has_significant_changes(old_text, new_text):
    print("Content changed significantly")

Graph Operations¶

AI clients are used in graph operations:

stages:
  - name: analyze
    sequential:
      - op: summarize
        input: text
        params:
          max_length: 500
          model: claude-3-5-sonnet
        output: summary

      - op: entities
        input: text
        params:
          types: [person, organization, location]
        output: entities

      - op: sentiment
        input: text
        output: sentiment_result

Error Handling¶

from cbintel.ai import CBAIClient, AIError, RateLimitError

async with CBAIClient() as ai:
    try:
        response = await ai.complete(prompt)
    except RateLimitError:
        print("Rate limited - wait and retry")
    except AIError as e:
        print(f"AI error: {e}")

Configuration¶

Environment Variables¶

# Anthropic
ANTHROPIC_API_KEY=sk-ant-...

# Ollama
OLLAMA_BASE_URL=http://127.0.0.1:11434
OLLAMA_EMBED_MODEL=nomic-embed-text

# CBAI (unified)
CBAI_DEFAULT_MODEL=claude-3-5-sonnet
CBAI_TEMPERATURE=0.7
CBAI_MAX_TOKENS=4096

Best Practices¶

Model Selection¶

Claude 3.5 Sonnet - Default for most tasks
Claude 3 Opus - Complex reasoning, long context
Claude 3 Haiku - Fast, simple tasks, cost-effective
Ollama models - Local processing, privacy, cost-free

Context Management¶

# Keep context concise
response = await ai.complete(
    "Summarize the key points",
    context=text[:10000]  # Limit context size
)

# Use system prompts for consistent behavior
response = await ai.complete(
    prompt,
    system="You are a research analyst. Be thorough and cite sources."
)

Error Handling¶

import asyncio

async def complete_with_retry(ai, prompt, max_retries=3):
    for attempt in range(max_retries):
        try:
            return await ai.complete(prompt)
        except RateLimitError:
            wait = 2 ** attempt  # Exponential backoff
            await asyncio.sleep(wait)
    raise Exception("Max retries exceeded")