Skip to content

Type Compatibility

Operations transform one type to another. This document defines which types are compatible.

Compatibility Matrix

Direct Compatibility

Operations that transform one type to another:

From To Operation Auto-Batch
Url Html fetch Yes
Url[] Html[] fetch_batch -
Html Text to_text Yes
Html Markdown to_markdown Yes
Html Url[] extract_links Yes
Text Chunk[] chunk No
Text Vector embed Yes
Text Entity[] entities Yes
Chunk[] Vector[] embed_batch -
Chunk[] Chunk[] semantic_filter -
Chunk[] Chunk[] quality_filter -
Chunk[] Text integrate -
Image Text ocr Yes
Pdf Text pdf_to_text Yes
Query Url[] search No
Text GeoPoint geocode Yes

Flow Diagram

flowchart LR
    Query -->|search| URLs[Url[]]
    URLs -->|fetch_batch| HTMLs[Html[]]
    HTMLs -->|to_text_batch| Texts[Text[]]
    Texts -->|chunk| Chunks[Chunk[]]
    Chunks -->|embed_batch| Vectors[Vector[]]
    Chunks -->|semantic_filter| FilteredChunks[Chunk[]]
    FilteredChunks -->|integrate| Synthesis[Text]
    Texts -->|entities| Entities[Entity[]]

    Image -->|ocr| Text
    Pdf -->|pdf_to_text| Text

Implicit Coercion

Safe automatic conversions that don't require explicit operations:

From To Rule
Markdown Text Strip formatting
Html Text Strip tags (simple)
Url string Identity (URL is a string)
Chunk[] Text Join with newlines
Entity[] Text Format as list
T T[] Wrap in array

Examples

# Markdown to Text (implicit)
- op: entities
  input: markdown_content  # Markdown coerced to Text
  output: entities

# Single to Array (implicit)
- op: fetch_batch
  input: single_url        # Url coerced to Url[]
  output: pages

Explicit Coercion

Conversions that require an operation:

From To Required Operation
Url Html fetch
Html Chunk[] to_text + chunk
Text Vector embed
Image Text ocr
string Url Validation only

Example Error

# ERROR: Cannot directly connect Url[] to chunk
- op: search
  output: urls          # Url[]

- op: chunk
  input: urls           # ERROR: chunk expects Text

Fix:

- op: search
  output: urls

- op: fetch_batch
  input: urls
  output: pages         # Html[]

- op: to_text_batch
  input: pages
  output: texts         # Text[]

- op: chunk
  input: texts          # OK: chunk accepts Text

Operation Type Signatures

Discovery Operations

search:           (Query, {max_results?: int}) -> Url[]
archive_discover: (Query, {domain: string}) -> Url[]
youtube_search:   (Query, {max_results?: int}) -> Url[]
tor_search:       (Query, {engine?: string}) -> TorResult[]

Acquisition Operations

fetch:            (Url, {geo?: string}) -> Html
fetch_batch:      (Url[], {geo?: string}) -> Html[]
screenshot:       (Url, {full_page?: bool}) -> Image
download:         (Url) -> bytes

Transform Operations

to_markdown:      (Html) -> Markdown
to_text:          (Html) -> Text
chunk:            (Text, {size?: int, overlap?: int}) -> Chunk[]
merge_chunks:     (Chunk[][]) -> Chunk[]

Process Operations

embed:            (Text) -> Vector
embed_batch:      (Chunk[]) -> Vector[]
entities:         (Text, {types?: EntityType[]}) -> Entity[]
summarize:        (Text, {max_length?: int}) -> Text
translate:        (Text, {target_lang: string}) -> Text

Filter Operations

semantic_filter:  (Chunk[], {query: string, threshold?: float}) -> Chunk[]
quality_filter:   (Chunk[], {threshold?: float}) -> Chunk[]
filter_urls:      (Url[], {pattern?: string}) -> Url[]
filter_entities:  (Entity[], {types?: EntityType[]}) -> Entity[]

Synthesize Operations

integrate:        (Chunk[], {query: string}) -> Text
compare:          (Text[], {aspects?: string[]}) -> Text
to_report:        (any, {template: string}) -> Markdown

Validation Levels

1. Parse-time

Validate types during YAML parsing:

from cbintel.graph import parse_yaml, ValidationError

try:
    graph_def = parse_yaml(yaml_content)
except ValidationError as e:
    print(f"Type error: {e}")

Validate compatibility when connecting operations:

# Validates that output of search (Url[]) can connect
# to input of fetch_batch (Url[])

3. Run-time

Validate actual data matches declared types:

# Verifies that fetch actually returns Html
# and not some other type

Error Messages

Type errors include helpful suggestions:

TypeValidationError:
  operation: chunk
  field: input
  expected: Text
  actual: Url[]
  message: "Operation 'chunk' expects Text input, but 'urls' has type Url[].
            Did you mean to use 'fetch_batch' + 'to_text' first?"

Best Practices

  1. Check signatures - Know operation input/output types
  2. Use intermediate operations - Add coercion steps
  3. Validate early - Check types during development
  4. Read error messages - Follow suggestions
  5. Use typed outputs - Explicit type declarations