Intent Classification¶

The Intent Classifier analyzes natural language queries to determine research intent and extract parameters.

Overview¶

flowchart LR
    QUERY[User Query] --> CLASSIFY[IntentClassifier]
    CLASSIFY --> INTENT[Intent Type]
    CLASSIFY --> PARAMS[Parameters]
    CLASSIFY --> CONF[Confidence]

    INTENT --> TEMPLATE[Graph Template]
    PARAMS --> TEMPLATE

IntentClassifier¶

from cbintel.chat import IntentClassifier

classifier = IntentClassifier()

result = await classifier.classify(
    "Research John Smith's voting record on healthcare"
)

print(f"Intent: {result.intent}")
print(f"Confidence: {result.confidence}")
print(f"Entities: {result.entities}")
print(f"Params: {result.params}")

Classification Result¶

@dataclass
class IntentResult:
    intent: str           # Intent type
    confidence: float     # 0.0 to 1.0
    entities: dict        # Extracted entities
    params: dict          # Parameters for graph
    suggestions: list     # Alternative intents

Intent Types¶

person_research¶

Research an individual.

# Query
"Research Senator Jane Smith"
"Deep dive on CEO John Doe"
"Background check on candidate Alice"

# Result
{
    "intent": "person_research",
    "entities": {"person": "Senator Jane Smith"},
    "params": {
        "subject_name": "Senator Jane Smith",
        "max_urls": 100
    }
}

company_research¶

Research a company or organization.

# Query
"Research Acme Corporation"
"Company profile for TechCorp Inc"

# Result
{
    "intent": "company_research",
    "entities": {"company": "Acme Corporation"},
    "params": {
        "company_name": "Acme Corporation",
        "include_news": True
    }
}

compare_sources¶

Compare how different sources cover a topic.

# Query
"Compare coverage of the election between these news sites"
"How do different outlets report on climate change?"

# Result
{
    "intent": "compare_sources",
    "entities": {"topic": "election"},
    "params": {
        "topic": "election coverage",
        "sources": ["url1", "url2", "url3"]
    }
}

track_position¶

Track positions or statements over time.

# Query
"How has the Senator's position on healthcare changed?"
"Track company statements about privacy over the years"

# Result
{
    "intent": "track_position",
    "entities": {
        "person": "Senator",
        "topic": "healthcare"
    },
    "params": {
        "subject": "Senator",
        "topic": "healthcare",
        "include_archives": True
    }
}

news_aggregation¶

Aggregate news on a topic.

# Query
"Latest news on AI regulation"
"News about the merger in the past week"

# Result
{
    "intent": "news_aggregation",
    "entities": {"topic": "AI regulation"},
    "params": {
        "topic": "AI regulation",
        "days": 7
    }
}

historical_analysis¶

Analyze historical content.

# Query
"How has this website changed since 2020?"
"Archive analysis of company about page"

# Result
{
    "intent": "historical_analysis",
    "entities": {"url": "https://example.com"},
    "params": {
        "url": "https://example.com",
        "start_date": "2020-01-01"
    }
}

video_research¶

Research from video content.

# Query
"Analyze this politician's YouTube interviews"
"Research public statements from video"

# Result
{
    "intent": "video_research",
    "entities": {"topic": "interviews"},
    "params": {
        "query": "politician interviews",
        "max_videos": 10
    }
}

Entity Extraction¶

The classifier extracts entities from queries:

Entity Type	Examples
`person`	"John Smith", "Senator Jane"
`company`	"Acme Corp", "TechCo Inc"
`location`	"California", "New York"
`topic`	"healthcare", "climate change"
`date`	"2020", "last month"
`url`	"https://example.com"

Entity Examples¶

# Query: "Research John Smith from Acme Corp in California"
{
    "entities": {
        "person": "John Smith",
        "company": "Acme Corp",
        "location": "California"
    }
}

Confidence Scoring¶

Classification includes confidence:

Confidence	Meaning
0.9 - 1.0	High confidence, proceed
0.7 - 0.9	Good confidence
0.5 - 0.7	Medium confidence, may clarify
< 0.5	Low confidence, suggest alternatives

Handling Low Confidence¶

result = await classifier.classify(query)

if result.confidence < 0.7:
    # Suggest alternatives
    print("Did you mean:")
    for suggestion in result.suggestions:
        print(f"  - {suggestion.intent} ({suggestion.description})")

Configuration¶

Classifier Options¶

classifier = IntentClassifier(
    model="claude-3-5-sonnet",  # AI model
    threshold=0.7,              # Min confidence
    include_suggestions=True,   # Include alternatives
)

Environment Variables¶

INTENT_MODEL=claude-3-5-sonnet
INTENT_CONFIDENCE_THRESHOLD=0.7

Custom Intents¶

Register custom intent types:

from cbintel.chat import register_intent

@register_intent("competitor_analysis")
class CompetitorAnalysisIntent:
    description = "Analyze competitors in a market"
    required_entities = ["company"]
    optional_entities = ["market", "region"]

    def to_params(self, entities):
        return {
            "company_name": entities["company"],
            "market": entities.get("market"),
            "include_news": True
        }

Intent to Template Mapping¶

INTENT_TEMPLATES = {
    "person_research": "opposition_research",
    "company_research": "company_profile",
    "compare_sources": "source_comparison",
    "track_position": "temporal_analysis",
    "news_aggregation": "news_aggregation",
    "historical_analysis": "temporal_analysis",
    "video_research": "video_analysis",
}

Error Handling¶

from cbintel.chat import ClassificationError

try:
    result = await classifier.classify(query)
except ClassificationError as e:
    print(f"Classification failed: {e}")

Best Practices¶

Be specific - "Research John Smith CEO" vs "Research John Smith"
Include context - Topics, time ranges, locations
Handle ambiguity - Provide clarification options
Log classifications - Track for improvement
Set thresholds - Appropriate confidence levels