Skip to content

Job Types

cbintel supports seven job types, each handled by a specialized worker.

Overview

Job Type Worker Input Output
crawl CrawlWorker Query, config URLs, chunks, synthesis
lazarus LazarusWorker Domain/URL, date range Snapshots, content
vectl VectlWorker Text or query Embeddings or matches
screenshot ScreenshotWorker URLs Images
transcript TranscriptWorker Video ID Transcript text
browser BrowserWorker URL, actions Automation results
graph GraphWorker Graph YAML, params Graph execution result

crawl

AI-powered web crawling with iterative batch processing.

Request Schema

{
  "query": "AI regulation trends",
  "max_urls": 50,
  "max_depth": 3,
  "geo": "us:ca",
  "ai_model": "claude-3-5-sonnet-20241022",
  "min_score": 6.0,
  "search_provider": "duckduckgo"
}
Field Type Required Default Description
query string Yes - Research query
max_urls int No 50 Maximum URLs to process
max_depth int No 3 Maximum batch depth
geo string No null Geographic routing
ai_model string No claude-3-5-sonnet AI model for evaluation
min_score float No 6.0 Minimum relevance score
search_provider string No duckduckgo Search engine

Response Schema

{
  "job_id": "job_abc123",
  "status": "COMPLETED",
  "result": {
    "total_urls": 42,
    "urls_processed": 42,
    "chunks_generated": 156,
    "embeddings_stored": true,
    "synthesis": "AI regulation is evolving rapidly...",
    "report_url": "https://files.nominate.ai/..."
  }
}

Example

curl -X POST https://intel.nominate.ai/api/v1/jobs/crawl \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the latest AI safety regulations?",
    "geo": "us:ca",
    "max_urls": 30
  }'

lazarus

Historical archive retrieval from Internet Archive and Common Crawl.

Request Schema

{
  "domain": "example.com",
  "url": "https://example.com/page",
  "from_date": "2020-01-01",
  "to_date": "2024-01-01",
  "sources": ["wayback", "commoncrawl"],
  "limit": 100
}
Field Type Required Default Description
domain string No* - Domain to discover URLs
url string No* - Specific URL to retrieve
from_date date No null Start date
to_date date No null End date
sources string[] No ["wayback"] Archive sources
limit int No 100 Max results

*Either domain or url is required.

Response Schema

{
  "job_id": "job_def456",
  "status": "COMPLETED",
  "result": {
    "urls_discovered": 1523,
    "snapshots_retrieved": 87,
    "date_range": {
      "earliest": "2015-03-21",
      "latest": "2024-01-10"
    },
    "output_url": "https://files.nominate.ai/..."
  }
}

vectl

Vector embedding generation and semantic search.

Embed Request

{
  "operation": "embed",
  "texts": ["First document", "Second document"],
  "model": "nomic-embed-text",
  "store": "my-index"
}

Search Request

{
  "operation": "search",
  "query": "machine learning algorithms",
  "store": "my-index",
  "top_k": 10
}
Field Type Required Default Description
operation string Yes - "embed" or "search"
texts string[] For embed - Texts to embed
query string For search - Search query
store string No default Vector store name
top_k int No 10 Number of results

Response Schema

{
  "job_id": "job_ghi789",
  "status": "COMPLETED",
  "result": {
    "operation": "search",
    "matches": [
      {"id": "doc1", "score": 0.92, "text": "..."},
      {"id": "doc2", "score": 0.87, "text": "..."}
    ]
  }
}

screenshot

Browser screenshot capture.

Request Schema

{
  "urls": ["https://example.com", "https://example.org"],
  "full_page": true,
  "format": "png",
  "viewport_width": 1920,
  "viewport_height": 1080,
  "geo": "us:ca"
}
Field Type Required Default Description
urls string[] Yes - URLs to capture
full_page bool No true Full page capture
format string No "png" Image format
viewport_width int No 1920 Viewport width
viewport_height int No 1080 Viewport height
geo string No null Geographic routing

Response Schema

{
  "job_id": "job_jkl012",
  "status": "COMPLETED",
  "result": {
    "urls_processed": 2,
    "format": "png",
    "screenshots": [
      {
        "url": "https://example.com",
        "file_url": "https://files.nominate.ai/...",
        "width": 1920,
        "height": 3500
      }
    ]
  }
}

transcript

YouTube video transcript extraction.

Request Schema

{
  "video_id": "dQw4w9WgXcQ",
  "language": "en",
  "include_timestamps": true
}
Field Type Required Default Description
video_id string Yes - YouTube video ID
language string No "en" Transcript language
include_timestamps bool No true Include timestamps

Response Schema

{
  "job_id": "job_mno345",
  "status": "COMPLETED",
  "result": {
    "video_id": "dQw4w9WgXcQ",
    "title": "Video Title",
    "duration_seconds": 212,
    "transcript": [
      {"start": 0.0, "text": "Never gonna give you up"},
      {"start": 3.5, "text": "Never gonna let you down"}
    ],
    "full_text": "Never gonna give you up..."
  }
}

browser

Ferret browser automation.

Request Schema

{
  "url": "https://example.com",
  "actions": [
    {"type": "fill", "selector": "input[name='q']", "value": "test"},
    {"type": "click", "selector": "button[type='submit']"},
    {"type": "wait_for_element", "selector": ".results"},
    {"type": "extract_text", "selector": ".results"}
  ],
  "timeout": 30000
}
Field Type Required Default Description
url string Yes - Starting URL
actions object[] Yes - Action sequence
timeout int No 30000 Timeout in ms
geo string No null Geographic routing

Action Types

Type Parameters Description
navigate url Navigate to URL
click selector Click element
fill selector, value Fill input
select selector, value Select option
wait ms Wait duration
wait_for_element selector Wait for element
extract_text selector Extract text
screenshot - Take screenshot

Response Schema

{
  "job_id": "job_pqr678",
  "status": "COMPLETED",
  "result": {
    "success": true,
    "actions_executed": 4,
    "extracted_data": {
      "results": "Search results text..."
    },
    "final_url": "https://example.com/results"
  }
}

graph

Research graph execution.

Request Schema

{
  "graph": "name: research_pipeline\nstages:\n  - name: discover\n    ...",
  "template": "basic_research",
  "params": {
    "query": "AI regulation",
    "max_urls": 50
  },
  "workspace_id": "ws_xyz789"
}
Field Type Required Default Description
graph string No* - Inline YAML graph
template string No* - Template name
params object No {} Graph parameters
workspace_id string No null Workspace for artifacts

*Either graph or template is required.

Response Schema

{
  "job_id": "job_stu901",
  "status": "COMPLETED",
  "result": {
    "graph_name": "research_pipeline",
    "stages_completed": 5,
    "stages_total": 5,
    "duration_seconds": 245,
    "outputs": {
      "urls": [...],
      "synthesis": "Research findings..."
    },
    "artifacts_url": "https://files.nominate.ai/..."
  }
}

Submitting Jobs

REST API

# Crawl
curl -X POST https://intel.nominate.ai/api/v1/jobs/crawl -d '...'

# Lazarus
curl -X POST https://intel.nominate.ai/api/v1/jobs/lazarus -d '...'

# Vectl
curl -X POST https://intel.nominate.ai/api/v1/jobs/vectl -d '...'

# Screenshot
curl -X POST https://intel.nominate.ai/api/v1/jobs/screenshot -d '...'

# Transcript
curl -X POST https://intel.nominate.ai/api/v1/jobs/transcript -d '...'

# Browser
curl -X POST https://intel.nominate.ai/api/v1/jobs/browser -d '...'

# Graph
curl -X POST https://intel.nominate.ai/api/v1/jobs/graph -d '...'

Python Client

from cbintel.client import JobsClient

client = JobsClient()

# Submit with specific type
job = await client.submit("crawl", {"query": "..."})
job = await client.submit("graph", {"template": "...", "params": {...}})