Skip to content

Type Coercion

Coercion is the automatic or explicit conversion between types.

Coercion Rules

Implicit Coercion

Automatic conversions that happen without explicit operations:

From To Method
Markdown Text Strip formatting markers
Html Text Strip HTML tags (basic)
Url string Use URL string directly
Query string Use query string directly
T T[] Wrap single value in array
T[] T Take first element
Chunk[] Text Join chunks with newlines
Entity[] Text Format entities as text list

Example: Implicit Wrap

# Single URL to array
- op: search
  output: urls          # Url[]

- op: filter_urls
  input: urls
  output: single_url    # Url

- op: fetch_batch
  input: single_url     # Url -> Url[] (implicit wrap)
  output: pages

Example: Implicit Text Conversion

# Markdown to Text for entities
- op: to_markdown
  input: html
  output: markdown      # Markdown

- op: entities
  input: markdown       # Markdown -> Text (implicit)
  output: entities

Explicit Coercion

Conversions that require operations:

URL to Content

# Url -> Html (explicit)
- op: fetch
  input: url
  output: html

# Url[] -> Html[] (explicit)
- op: fetch_batch
  input: urls
  output: pages

Content to Text

# Html -> Text (explicit)
- op: to_text
  input: html
  output: text

# Html -> Markdown (explicit)
- op: to_markdown
  input: html
  output: markdown

Text to Chunks

# Text -> Chunk[] (explicit)
- op: chunk
  input: text
  params:
    size: 500
  output: chunks

Text to Embeddings

# Text -> Vector (explicit)
- op: embed
  input: text
  output: vector

# Chunk[] -> Vector[] (explicit)
- op: embed_batch
  input: chunks
  output: vectors

Image to Text

# Image -> Text (explicit)
- op: ocr
  input: image
  output: text

Coercion Chain

Complex transformations require multiple operations:

URL to Chunks

stages:
  - name: process
    sequential:
      # Step 1: Url -> Html
      - op: fetch
        input: url
        output: html

      # Step 2: Html -> Text
      - op: to_text
        input: html
        output: text

      # Step 3: Text -> Chunk[]
      - op: chunk
        input: text
        output: chunks

URL to Embeddings

stages:
  - name: embed_url
    sequential:
      # Url -> Html -> Text -> Chunk[] -> Vector[]
      - op: fetch
        input: url
        output: html

      - op: to_text
        input: html
        output: text

      - op: chunk
        input: text
        output: chunks

      - op: embed_batch
        input: chunks
        output: vectors

Auto-Batch Coercion

Some operations automatically batch:

# Single URL
- op: fetch
  params:
    url: "https://example.com"
  output: page

# Array of URLs - same operation, batched
- op: fetch
  input: urls          # Url[]
  output: pages        # Html[]

Operations with batch_supported=True:

  • fetch / fetch_batch
  • to_text / to_text_batch
  • to_markdown / to_markdown_batch
  • embed / embed_batch
  • screenshot (for multiple URLs)

Lossy Coercion

Some coercions lose information:

Coercion Lost Information
HtmlText HTML structure, links
MarkdownText Formatting
Chunk[]Text Chunk boundaries
T[]T All elements except first

Handle with Care

# May lose important structure
- op: to_text
  input: html          # Loses links, images
  output: text

# Better: Keep HTML for link extraction
- op: extract_links
  input: html          # Preserves structure
  output: links

Custom Coercion

Define custom type coercion with type aliases:

types:
  # Constrained types
  GovUrl:
    base: url
    constraints:
      - "domain_matches('*.gov')"
    coerce_from:
      - url              # Allow Url -> GovUrl (with validation)

  ShortText:
    base: text
    constraints:
      - "len(text) < 1000"
    coerce_from:
      - text             # Allow Text -> ShortText (with validation)

Validation on Coercion

Coercion may include validation:

# Url validation
"https://example.com"  # Valid
"not-a-url"            # Invalid - coercion fails

# GovUrl validation
"https://state.gov"    # Valid
"https://example.com"  # Invalid - domain doesn't match

Error Handling

# Type error on invalid coercion
- op: chunk
  input: urls          # ERROR: Cannot coerce Url[] to Text

# Fix: Add explicit coercion
- op: fetch_batch
  input: urls
  output: pages

- op: to_text_batch
  input: pages
  output: texts

- op: chunk
  input: texts         # OK

Error message:

CoercionError:
  from: Url[]
  to: Text
  message: "Cannot coerce Url[] to Text.
            Use 'fetch_batch' and 'to_text_batch' first."

Best Practices

  1. Be explicit - Prefer explicit operations over implicit coercion
  2. Know what's lost - Understand lossy coercions
  3. Validate types - Check types during development
  4. Use chains - Build proper transformation pipelines
  5. Handle errors - Catch coercion failures