Type System¶
The graph type system provides formal type definitions, compatibility checking, and automatic coercion for data flowing through research pipelines.
Overview¶
graph TB
GT[GraphType]
GT --> PRIM[Primitive]
GT --> DOM[Domain]
GT --> COLL["Collection[T]"]
GT --> COMP[Compound]
GT --> UNION[Union]
PRIM --> void & bool & int & float & string & bytes & datetime
DOM --> STR_BASED[String-based]
DOM --> OBJ_BASED[Object-based]
DOM --> BIN_BASED[Binary-based]
STR_BASED --> Url & Html & Markdown & Text & Query & Json
OBJ_BASED --> Entity & Chunk & Vector & GeoPoint
BIN_BASED --> Image & Pdf
Quick Reference¶
| Document | Description |
|---|---|
| Type Hierarchy | Primitive, Domain, Collection types |
| Compatibility | Type compatibility matrix |
| Coercion | Implicit and explicit coercion |
| Filters | Filter expression grammar |
Purpose¶
The type system provides:
- Formal type definitions for all data flowing through graphs
- Compatibility checking between operation outputs and inputs
- Automatic coercion where safe conversions exist
- Filter expressions for conditional data flow
- Chat→Graph interface for interactive pipeline building
Type Categories¶
Primitive Types¶
Basic data types:
| Type | Python | Description |
|---|---|---|
void |
None |
No data |
bool |
bool |
Boolean |
int |
int |
Integer |
float |
float |
Decimal |
string |
str |
Text |
bytes |
bytes |
Binary |
datetime |
datetime |
Timestamp |
Domain Types¶
Semantic types backed by primitives:
| Type | Base | Description |
|---|---|---|
Url |
string | HTTP/HTTPS URL |
Html |
string | HTML document |
Markdown |
string | Markdown text |
Text |
string | Plain text |
Query |
string | Search query |
Entity |
object | Named entity |
Chunk |
object | Text chunk |
Vector |
float[] | Embedding |
Image |
bytes | Image data |
Pdf |
bytes | PDF document |
Collection Types¶
Arrays of any type:
Url[] # Array of URLs
Chunk[] # Array of chunks
Entity[] # Array of entities
Vector[] # Array of vectors
Union Types¶
Value can be one of several types:
Type Signatures¶
Operations have typed signatures:
search: (Query, {max_results?: int}) -> Url[]
fetch: (Url, {geo?: string}) -> Html
to_text: (Html) -> Text
chunk: (Text, {size?: int}) -> Chunk[]
embed_batch: (Chunk[]) -> Vector[]
entities: (Text) -> Entity[]
integrate: (Chunk[], {query: string}) -> Text
Validation¶
Types are validated at:
- Parse-time - During YAML parsing
- Link-time - When connecting operations
- Run-time - When data flows through
Example Error¶
# This graph has a type error
stages:
- name: discover
sequential:
- op: search
params: {query: "test"}
output: urls # Type: Url[]
- name: process
sequential:
- op: chunk # ERROR: chunk expects Text, got Url[]
input: urls
Error message:
TypeValidationError:
operation: chunk
field: input
expected: Text
actual: Url[]
message: "Operation 'chunk' expects Text input, but 'urls' has type Url[].
Did you mean to use 'fetch_batch' + 'to_text' first?"
Usage in Graphs¶
Typed Inputs¶
inputs:
- name: query
type: string
required: true
- name: urls
type: url[]
default: []
- name: options
type: object
schema:
max_results: {type: int, default: 10, range: [1, 100]}
Typed Operations¶
- op: semantic_filter
input:
type: chunk[]
from: chunks
params:
query:
type: string
required: true
threshold:
type: float
range: [0.0, 1.0]
output:
name: filtered
type: chunk[]
Custom Type Aliases¶
types:
PersonEntity:
base: entity
constraints:
- "type == 'person'"
RelevantChunk:
base: chunk
constraints:
- "quality_score >= 0.5"
- "word_count >= 50"
Best Practices¶
- Use specific types -
Urloverstringfor URLs - Validate early - Check types during graph design
- Handle errors - Provide helpful suggestions
- Document types - Clear input/output documentation
- Use constraints - Define custom types for clarity