Type Compatibility¶
Operations transform one type to another. This document defines which types are compatible.
Compatibility Matrix¶
Direct Compatibility¶
Operations that transform one type to another:
| From | To | Operation | Auto-Batch |
|---|---|---|---|
Url |
Html |
fetch |
Yes |
Url[] |
Html[] |
fetch_batch |
- |
Html |
Text |
to_text |
Yes |
Html |
Markdown |
to_markdown |
Yes |
Html |
Url[] |
extract_links |
Yes |
Text |
Chunk[] |
chunk |
No |
Text |
Vector |
embed |
Yes |
Text |
Entity[] |
entities |
Yes |
Chunk[] |
Vector[] |
embed_batch |
- |
Chunk[] |
Chunk[] |
semantic_filter |
- |
Chunk[] |
Chunk[] |
quality_filter |
- |
Chunk[] |
Text |
integrate |
- |
Image |
Text |
ocr |
Yes |
Pdf |
Text |
pdf_to_text |
Yes |
Query |
Url[] |
search |
No |
Text |
GeoPoint |
geocode |
Yes |
Flow Diagram¶
flowchart LR
Query -->|search| URLs[Url[]]
URLs -->|fetch_batch| HTMLs[Html[]]
HTMLs -->|to_text_batch| Texts[Text[]]
Texts -->|chunk| Chunks[Chunk[]]
Chunks -->|embed_batch| Vectors[Vector[]]
Chunks -->|semantic_filter| FilteredChunks[Chunk[]]
FilteredChunks -->|integrate| Synthesis[Text]
Texts -->|entities| Entities[Entity[]]
Image -->|ocr| Text
Pdf -->|pdf_to_text| Text
Implicit Coercion¶
Safe automatic conversions that don't require explicit operations:
| From | To | Rule |
|---|---|---|
Markdown |
Text |
Strip formatting |
Html |
Text |
Strip tags (simple) |
Url |
string |
Identity (URL is a string) |
Chunk[] |
Text |
Join with newlines |
Entity[] |
Text |
Format as list |
T |
T[] |
Wrap in array |
Examples¶
# Markdown to Text (implicit)
- op: entities
input: markdown_content # Markdown coerced to Text
output: entities
# Single to Array (implicit)
- op: fetch_batch
input: single_url # Url coerced to Url[]
output: pages
Explicit Coercion¶
Conversions that require an operation:
| From | To | Required Operation |
|---|---|---|
Url |
Html |
fetch |
Html |
Chunk[] |
to_text + chunk |
Text |
Vector |
embed |
Image |
Text |
ocr |
string |
Url |
Validation only |
Example Error¶
# ERROR: Cannot directly connect Url[] to chunk
- op: search
output: urls # Url[]
- op: chunk
input: urls # ERROR: chunk expects Text
Fix:
- op: search
output: urls
- op: fetch_batch
input: urls
output: pages # Html[]
- op: to_text_batch
input: pages
output: texts # Text[]
- op: chunk
input: texts # OK: chunk accepts Text
Operation Type Signatures¶
Discovery Operations¶
search: (Query, {max_results?: int}) -> Url[]
archive_discover: (Query, {domain: string}) -> Url[]
youtube_search: (Query, {max_results?: int}) -> Url[]
tor_search: (Query, {engine?: string}) -> TorResult[]
Acquisition Operations¶
fetch: (Url, {geo?: string}) -> Html
fetch_batch: (Url[], {geo?: string}) -> Html[]
screenshot: (Url, {full_page?: bool}) -> Image
download: (Url) -> bytes
Transform Operations¶
to_markdown: (Html) -> Markdown
to_text: (Html) -> Text
chunk: (Text, {size?: int, overlap?: int}) -> Chunk[]
merge_chunks: (Chunk[][]) -> Chunk[]
Process Operations¶
embed: (Text) -> Vector
embed_batch: (Chunk[]) -> Vector[]
entities: (Text, {types?: EntityType[]}) -> Entity[]
summarize: (Text, {max_length?: int}) -> Text
translate: (Text, {target_lang: string}) -> Text
Filter Operations¶
semantic_filter: (Chunk[], {query: string, threshold?: float}) -> Chunk[]
quality_filter: (Chunk[], {threshold?: float}) -> Chunk[]
filter_urls: (Url[], {pattern?: string}) -> Url[]
filter_entities: (Entity[], {types?: EntityType[]}) -> Entity[]
Synthesize Operations¶
integrate: (Chunk[], {query: string}) -> Text
compare: (Text[], {aspects?: string[]}) -> Text
to_report: (any, {template: string}) -> Markdown
Validation Levels¶
1. Parse-time¶
Validate types during YAML parsing:
from cbintel.graph import parse_yaml, ValidationError
try:
graph_def = parse_yaml(yaml_content)
except ValidationError as e:
print(f"Type error: {e}")
2. Link-time¶
Validate compatibility when connecting operations:
3. Run-time¶
Validate actual data matches declared types:
Error Messages¶
Type errors include helpful suggestions:
TypeValidationError:
operation: chunk
field: input
expected: Text
actual: Url[]
message: "Operation 'chunk' expects Text input, but 'urls' has type Url[].
Did you mean to use 'fetch_batch' + 'to_text' first?"
Best Practices¶
- Check signatures - Know operation input/output types
- Use intermediate operations - Add coercion steps
- Validate early - Check types during development
- Read error messages - Follow suggestions
- Use typed outputs - Explicit type declarations