Skip to content

Intent Classification

The Intent Classifier analyzes natural language queries to determine research intent and extract parameters.

Overview

flowchart LR
    QUERY[User Query] --> CLASSIFY[IntentClassifier]
    CLASSIFY --> INTENT[Intent Type]
    CLASSIFY --> PARAMS[Parameters]
    CLASSIFY --> CONF[Confidence]

    INTENT --> TEMPLATE[Graph Template]
    PARAMS --> TEMPLATE

IntentClassifier

from cbintel.chat import IntentClassifier

classifier = IntentClassifier()

result = await classifier.classify(
    "Research John Smith's voting record on healthcare"
)

print(f"Intent: {result.intent}")
print(f"Confidence: {result.confidence}")
print(f"Entities: {result.entities}")
print(f"Params: {result.params}")

Classification Result

@dataclass
class IntentResult:
    intent: str           # Intent type
    confidence: float     # 0.0 to 1.0
    entities: dict        # Extracted entities
    params: dict          # Parameters for graph
    suggestions: list     # Alternative intents

Intent Types

person_research

Research an individual.

# Query
"Research Senator Jane Smith"
"Deep dive on CEO John Doe"
"Background check on candidate Alice"

# Result
{
    "intent": "person_research",
    "entities": {"person": "Senator Jane Smith"},
    "params": {
        "subject_name": "Senator Jane Smith",
        "max_urls": 100
    }
}

company_research

Research a company or organization.

# Query
"Research Acme Corporation"
"Company profile for TechCorp Inc"

# Result
{
    "intent": "company_research",
    "entities": {"company": "Acme Corporation"},
    "params": {
        "company_name": "Acme Corporation",
        "include_news": True
    }
}

compare_sources

Compare how different sources cover a topic.

# Query
"Compare coverage of the election between these news sites"
"How do different outlets report on climate change?"

# Result
{
    "intent": "compare_sources",
    "entities": {"topic": "election"},
    "params": {
        "topic": "election coverage",
        "sources": ["url1", "url2", "url3"]
    }
}

track_position

Track positions or statements over time.

# Query
"How has the Senator's position on healthcare changed?"
"Track company statements about privacy over the years"

# Result
{
    "intent": "track_position",
    "entities": {
        "person": "Senator",
        "topic": "healthcare"
    },
    "params": {
        "subject": "Senator",
        "topic": "healthcare",
        "include_archives": True
    }
}

news_aggregation

Aggregate news on a topic.

# Query
"Latest news on AI regulation"
"News about the merger in the past week"

# Result
{
    "intent": "news_aggregation",
    "entities": {"topic": "AI regulation"},
    "params": {
        "topic": "AI regulation",
        "days": 7
    }
}

historical_analysis

Analyze historical content.

# Query
"How has this website changed since 2020?"
"Archive analysis of company about page"

# Result
{
    "intent": "historical_analysis",
    "entities": {"url": "https://example.com"},
    "params": {
        "url": "https://example.com",
        "start_date": "2020-01-01"
    }
}

video_research

Research from video content.

# Query
"Analyze this politician's YouTube interviews"
"Research public statements from video"

# Result
{
    "intent": "video_research",
    "entities": {"topic": "interviews"},
    "params": {
        "query": "politician interviews",
        "max_videos": 10
    }
}

Entity Extraction

The classifier extracts entities from queries:

Entity Type Examples
person "John Smith", "Senator Jane"
company "Acme Corp", "TechCo Inc"
location "California", "New York"
topic "healthcare", "climate change"
date "2020", "last month"
url "https://example.com"

Entity Examples

# Query: "Research John Smith from Acme Corp in California"
{
    "entities": {
        "person": "John Smith",
        "company": "Acme Corp",
        "location": "California"
    }
}

Confidence Scoring

Classification includes confidence:

Confidence Meaning
0.9 - 1.0 High confidence, proceed
0.7 - 0.9 Good confidence
0.5 - 0.7 Medium confidence, may clarify
< 0.5 Low confidence, suggest alternatives

Handling Low Confidence

result = await classifier.classify(query)

if result.confidence < 0.7:
    # Suggest alternatives
    print("Did you mean:")
    for suggestion in result.suggestions:
        print(f"  - {suggestion.intent} ({suggestion.description})")

Configuration

Classifier Options

classifier = IntentClassifier(
    model="claude-3-5-sonnet",  # AI model
    threshold=0.7,              # Min confidence
    include_suggestions=True,   # Include alternatives
)

Environment Variables

INTENT_MODEL=claude-3-5-sonnet
INTENT_CONFIDENCE_THRESHOLD=0.7

Custom Intents

Register custom intent types:

from cbintel.chat import register_intent

@register_intent("competitor_analysis")
class CompetitorAnalysisIntent:
    description = "Analyze competitors in a market"
    required_entities = ["company"]
    optional_entities = ["market", "region"]

    def to_params(self, entities):
        return {
            "company_name": entities["company"],
            "market": entities.get("market"),
            "include_news": True
        }

Intent to Template Mapping

INTENT_TEMPLATES = {
    "person_research": "opposition_research",
    "company_research": "company_profile",
    "compare_sources": "source_comparison",
    "track_position": "temporal_analysis",
    "news_aggregation": "news_aggregation",
    "historical_analysis": "temporal_analysis",
    "video_research": "video_analysis",
}

Error Handling

from cbintel.chat import ClassificationError

try:
    result = await classifier.classify(query)
except ClassificationError as e:
    print(f"Classification failed: {e}")

Best Practices

  1. Be specific - "Research John Smith CEO" vs "Research John Smith"
  2. Include context - Topics, time ranges, locations
  3. Handle ambiguity - Provide clarification options
  4. Log classifications - Track for improvement
  5. Set thresholds - Appropriate confidence levels