Skip to content

Operations Reference

Complete reference for all 31 graph operations.

Discovery Operations

Web search to discover URLs.

- op: search
  params:
    query: "AI regulation"
    max_results: 50
    provider: "duckduckgo"
  output: urls
Param Type Default Description
query string required Search query
max_results int 50 Max results
provider string duckduckgo Search engine

Input: None Output: Url[]

archive_discover

Discover historical URLs from archives.

- op: archive_discover
  params:
    domain: "example.com"
    sources: ["wayback", "commoncrawl"]
    limit: 1000
  output: urls
Param Type Default Description
domain string required Domain to discover
sources string[] ["wayback"] Archive sources
limit int 100 Max URLs

Output: Url[]

Search YouTube videos.

- op: youtube_search
  params:
    query: "AI safety interviews"
    max_results: 20
  output: video_urls

Output: Url[]


Acquisition Operations

fetch

Fetch single URL content.

- op: fetch
  params:
    url: "https://example.com"
    geo: "us:ca"
    timeout: 30
  output: content
Param Type Default Description
url string required URL to fetch
geo string null Geographic routing
timeout int 30 Timeout seconds

Output: Html

fetch_batch

Fetch multiple URLs.

- op: fetch_batch
  input: urls
  params:
    geo: "us:ca"
    concurrency: 10
  output: pages

Input: Url[] Output: Html[]

fetch_archive

Fetch historical content.

- op: fetch_archive
  input: url
  params:
    date: "2020-01-01"
    fallback: true
  output: content

Input: Url Output: Html

tor_fetch

Fetch through Tor network.

- op: tor_fetch
  params:
    url: "http://example.onion"
    mode: "sticky"
    sticky_key: "example.onion"
  output: content

Output: Html

screenshot

Capture browser screenshot.

- op: screenshot
  params:
    url: "https://example.com"
    full_page: true
    viewport_width: 1920
  output: image

Output: Image

tor_screenshot

Screenshot through Tor.

- op: tor_screenshot
  params:
    url: "http://example.onion"
    timeout: 90000
  output: image

Output: Image

download

Download binary file.

- op: download
  params:
    url: "https://example.com/file.pdf"
  output: file

Output: bytes


Transform Operations

to_markdown

Convert HTML to Markdown.

- op: to_markdown
  input: html
  output: markdown

Input: Html Output: Markdown

to_text

Convert HTML to plain text.

- op: to_text
  input: html
  output: text

Input: Html Output: Text

to_text_batch

Batch text conversion.

- op: to_text_batch
  input: pages
  output: texts

Input: Html[] Output: Text[]

chunk

Split text into chunks.

- op: chunk
  input: text
  params:
    size: 500
    overlap: 50
  output: chunks
Param Type Default Description
size int 500 Words per chunk
overlap int 50 Overlapping words

Input: Text Output: Chunk[]

Extract URLs from HTML.

- op: extract_links
  input: html
  params:
    base_url: "https://example.com"
  output: urls

Input: Html Output: Url[]

merge

Merge multiple texts.

- op: merge
  input: [text1, text2, text3]
  params:
    separator: "\n\n"
  output: combined

Input: Text[] Output: Text


Process Operations

embed

Generate single embedding.

- op: embed
  input: text
  params:
    model: "nomic-embed-text"
  output: vector

Input: Text Output: Vector

embed_batch

Batch embedding generation.

- op: embed_batch
  input: chunks
  params:
    model: "nomic-embed-text"
    batch_size: 32
  output: vectors

Input: Chunk[] Output: Vector[]

ocr

Extract text from image.

- op: ocr
  input: image
  output: text

Input: Image Output: Text


Filter Operations

semantic_filter

Filter by semantic similarity.

- op: semantic_filter
  input: chunks
  params:
    query: "AI safety"
    threshold: 0.5
  output: relevant_chunks
Param Type Default Description
query string required Filter query
threshold float 0.5 Min similarity

Input: Chunk[] Output: Chunk[]

quality_filter

Filter by quality score.

- op: quality_filter
  input: chunks
  params:
    min_words: 50
    min_quality: 0.5
  output: quality_chunks

Input: Chunk[] Output: Chunk[]

filter_urls

Filter URLs by pattern.

- op: filter_urls
  input: urls
  params:
    pattern: "*.gov"
    exclude: ["tracking", "analytics"]
  output: filtered_urls

Input: Url[] Output: Url[]

filter_entities

Filter entities by type/confidence.

- op: filter_entities
  input: entities
  params:
    types: ["person", "organization"]
    min_confidence: 0.7
  output: filtered_entities

Input: Entity[] Output: Entity[]


Extract Operations

entities

Extract named entities.

- op: entities
  input: text
  params:
    types: [person, organization, location, event]
  output: entities
Param Type Default Description
types string[] all Entity types

Input: Text Output: Entity[]

topics

Extract topics.

- op: topics
  input: text
  params:
    max_topics: 10
  output: topics

Input: Text Output: string[]

summarize

Generate summary.

- op: summarize
  input: text
  params:
    max_length: 500
    model: "claude-3-5-sonnet"
  output: summary

Input: Text Output: Text


Store Operations

store_vector

Store embedding.

- op: store_vector
  input: vector
  params:
    store: "my-index"
    id: "doc_001"
    metadata:
      source: "{{ url }}"
  output: ref

store_vectors

Batch store embeddings.

- op: store_vectors
  input: [chunks, vectors]
  params:
    store: "my-index"
  output: refs

store_entity

Store entity.

- op: store_entity
  input: entity
  params:
    store: "entities"
  output: ref

store_entities

Batch store entities.

- op: store_entities
  input: entities
  params:
    store: "entities"
  output: refs

search_vectors

Semantic vector search.

- op: search_vectors
  params:
    query: "machine learning"
    store: "my-index"
    top_k: 10
  output: matches

Output: Chunk[]


Synthesize Operations

integrate

Synthesize chunks into summary.

- op: integrate
  input: chunks
  params:
    query: "{{ query }}"
    model: "claude-3-5-sonnet"
  output: synthesis

Input: Chunk[] Output: Text

chat

AI conversation.

- op: chat
  params:
    messages:
      - role: user
        content: "Analyze this data: {{ data }}"
    model: "claude-3-5-sonnet"
  output: response

Output: Text

compare

Compare multiple texts.

- op: compare
  input: [text1, text2]
  params:
    aspects: ["coverage", "tone", "facts"]
  output: comparison

Input: Text[] Output: Text

to_report

Generate structured report.

- op: to_report
  input:
    synthesis: synthesis
    entities: entities
    sources: urls
  params:
    template: research_report
    include_diagrams: true
  output: report

Output: Markdown


Geo Operations

geocode

Forward geocoding.

- op: geocode
  params:
    query: "1600 Pennsylvania Ave, Washington DC"
  output: location

Output: GeoPoint

reverse_geocode

Reverse geocoding.

- op: reverse_geocode
  input: location
  output: address

Input: GeoPoint Output: string


Utility Operations

diff

Compare content versions.

- op: diff
  input: [old_text, new_text]
  output: changes

sentiment

Sentiment analysis.

- op: sentiment
  input: text
  output: sentiment_result

translate

Text translation.

- op: translate
  input: text
  params:
    target_lang: "en"
  output: translated