Type Coercion¶
Coercion is the automatic or explicit conversion between types.
Coercion Rules¶
Implicit Coercion¶
Automatic conversions that happen without explicit operations:
| From | To | Method |
|---|---|---|
Markdown |
Text |
Strip formatting markers |
Html |
Text |
Strip HTML tags (basic) |
Url |
string |
Use URL string directly |
Query |
string |
Use query string directly |
T |
T[] |
Wrap single value in array |
T[] |
T |
Take first element |
Chunk[] |
Text |
Join chunks with newlines |
Entity[] |
Text |
Format entities as text list |
Example: Implicit Wrap¶
# Single URL to array
- op: search
output: urls # Url[]
- op: filter_urls
input: urls
output: single_url # Url
- op: fetch_batch
input: single_url # Url -> Url[] (implicit wrap)
output: pages
Example: Implicit Text Conversion¶
# Markdown to Text for entities
- op: to_markdown
input: html
output: markdown # Markdown
- op: entities
input: markdown # Markdown -> Text (implicit)
output: entities
Explicit Coercion¶
Conversions that require operations:
URL to Content¶
# Url -> Html (explicit)
- op: fetch
input: url
output: html
# Url[] -> Html[] (explicit)
- op: fetch_batch
input: urls
output: pages
Content to Text¶
# Html -> Text (explicit)
- op: to_text
input: html
output: text
# Html -> Markdown (explicit)
- op: to_markdown
input: html
output: markdown
Text to Chunks¶
Text to Embeddings¶
# Text -> Vector (explicit)
- op: embed
input: text
output: vector
# Chunk[] -> Vector[] (explicit)
- op: embed_batch
input: chunks
output: vectors
Image to Text¶
Coercion Chain¶
Complex transformations require multiple operations:
URL to Chunks¶
stages:
- name: process
sequential:
# Step 1: Url -> Html
- op: fetch
input: url
output: html
# Step 2: Html -> Text
- op: to_text
input: html
output: text
# Step 3: Text -> Chunk[]
- op: chunk
input: text
output: chunks
URL to Embeddings¶
stages:
- name: embed_url
sequential:
# Url -> Html -> Text -> Chunk[] -> Vector[]
- op: fetch
input: url
output: html
- op: to_text
input: html
output: text
- op: chunk
input: text
output: chunks
- op: embed_batch
input: chunks
output: vectors
Auto-Batch Coercion¶
Some operations automatically batch:
# Single URL
- op: fetch
params:
url: "https://example.com"
output: page
# Array of URLs - same operation, batched
- op: fetch
input: urls # Url[]
output: pages # Html[]
Operations with batch_supported=True:
fetch/fetch_batchto_text/to_text_batchto_markdown/to_markdown_batchembed/embed_batchscreenshot(for multiple URLs)
Lossy Coercion¶
Some coercions lose information:
| Coercion | Lost Information |
|---|---|
Html → Text |
HTML structure, links |
Markdown → Text |
Formatting |
Chunk[] → Text |
Chunk boundaries |
T[] → T |
All elements except first |
Handle with Care¶
# May lose important structure
- op: to_text
input: html # Loses links, images
output: text
# Better: Keep HTML for link extraction
- op: extract_links
input: html # Preserves structure
output: links
Custom Coercion¶
Define custom type coercion with type aliases:
types:
# Constrained types
GovUrl:
base: url
constraints:
- "domain_matches('*.gov')"
coerce_from:
- url # Allow Url -> GovUrl (with validation)
ShortText:
base: text
constraints:
- "len(text) < 1000"
coerce_from:
- text # Allow Text -> ShortText (with validation)
Validation on Coercion¶
Coercion may include validation:
# Url validation
"https://example.com" # Valid
"not-a-url" # Invalid - coercion fails
# GovUrl validation
"https://state.gov" # Valid
"https://example.com" # Invalid - domain doesn't match
Error Handling¶
# Type error on invalid coercion
- op: chunk
input: urls # ERROR: Cannot coerce Url[] to Text
# Fix: Add explicit coercion
- op: fetch_batch
input: urls
output: pages
- op: to_text_batch
input: pages
output: texts
- op: chunk
input: texts # OK
Error message:
CoercionError:
from: Url[]
to: Text
message: "Cannot coerce Url[] to Text.
Use 'fetch_batch' and 'to_text_batch' first."
Best Practices¶
- Be explicit - Prefer explicit operations over implicit coercion
- Know what's lost - Understand lossy coercions
- Validate types - Check types during development
- Use chains - Build proper transformation pipelines
- Handle errors - Catch coercion failures