Skip to content

Type System

The graph type system provides formal type definitions, compatibility checking, and automatic coercion for data flowing through research pipelines.

Overview

graph TB
    GT[GraphType]

    GT --> PRIM[Primitive]
    GT --> DOM[Domain]
    GT --> COLL["Collection[T]"]
    GT --> COMP[Compound]
    GT --> UNION[Union]

    PRIM --> void & bool & int & float & string & bytes & datetime

    DOM --> STR_BASED[String-based]
    DOM --> OBJ_BASED[Object-based]
    DOM --> BIN_BASED[Binary-based]

    STR_BASED --> Url & Html & Markdown & Text & Query & Json
    OBJ_BASED --> Entity & Chunk & Vector & GeoPoint
    BIN_BASED --> Image & Pdf

Quick Reference

Document Description
Type Hierarchy Primitive, Domain, Collection types
Compatibility Type compatibility matrix
Coercion Implicit and explicit coercion
Filters Filter expression grammar

Purpose

The type system provides:

  • Formal type definitions for all data flowing through graphs
  • Compatibility checking between operation outputs and inputs
  • Automatic coercion where safe conversions exist
  • Filter expressions for conditional data flow
  • Chat→Graph interface for interactive pipeline building

Type Categories

Primitive Types

Basic data types:

Type Python Description
void None No data
bool bool Boolean
int int Integer
float float Decimal
string str Text
bytes bytes Binary
datetime datetime Timestamp

Domain Types

Semantic types backed by primitives:

Type Base Description
Url string HTTP/HTTPS URL
Html string HTML document
Markdown string Markdown text
Text string Plain text
Query string Search query
Entity object Named entity
Chunk object Text chunk
Vector float[] Embedding
Image bytes Image data
Pdf bytes PDF document

Collection Types

Arrays of any type:

Url[]           # Array of URLs
Chunk[]         # Array of chunks
Entity[]        # Array of entities
Vector[]        # Array of vectors

Union Types

Value can be one of several types:

Text | Html     # Either plain text or HTML
Url | Url[]     # Single URL or array

Type Signatures

Operations have typed signatures:

search:         (Query, {max_results?: int}) -> Url[]
fetch:          (Url, {geo?: string}) -> Html
to_text:        (Html) -> Text
chunk:          (Text, {size?: int}) -> Chunk[]
embed_batch:    (Chunk[]) -> Vector[]
entities:       (Text) -> Entity[]
integrate:      (Chunk[], {query: string}) -> Text

Validation

Types are validated at:

  1. Parse-time - During YAML parsing
  2. Link-time - When connecting operations
  3. Run-time - When data flows through

Example Error

# This graph has a type error
stages:
  - name: discover
    sequential:
      - op: search
        params: {query: "test"}
        output: urls          # Type: Url[]

  - name: process
    sequential:
      - op: chunk              # ERROR: chunk expects Text, got Url[]
        input: urls

Error message:

TypeValidationError:
  operation: chunk
  field: input
  expected: Text
  actual: Url[]
  message: "Operation 'chunk' expects Text input, but 'urls' has type Url[].
            Did you mean to use 'fetch_batch' + 'to_text' first?"

Usage in Graphs

Typed Inputs

inputs:
  - name: query
    type: string
    required: true

  - name: urls
    type: url[]
    default: []

  - name: options
    type: object
    schema:
      max_results: {type: int, default: 10, range: [1, 100]}

Typed Operations

- op: semantic_filter
  input:
    type: chunk[]
    from: chunks
  params:
    query:
      type: string
      required: true
    threshold:
      type: float
      range: [0.0, 1.0]
  output:
    name: filtered
    type: chunk[]

Custom Type Aliases

types:
  PersonEntity:
    base: entity
    constraints:
      - "type == 'person'"

  RelevantChunk:
    base: chunk
    constraints:
      - "quality_score >= 0.5"
      - "word_count >= 50"

Best Practices

  1. Use specific types - Url over string for URLs
  2. Validate early - Check types during graph design
  3. Handle errors - Provide helpful suggestions
  4. Document types - Clear input/output documentation
  5. Use constraints - Define custom types for clarity