Job Types¶
cbintel supports seven job types, each handled by a specialized worker.
Overview¶
| Job Type | Worker | Input | Output |
|---|---|---|---|
crawl |
CrawlWorker | Query, config | URLs, chunks, synthesis |
lazarus |
LazarusWorker | Domain/URL, date range | Snapshots, content |
vectl |
VectlWorker | Text or query | Embeddings or matches |
screenshot |
ScreenshotWorker | URLs | Images |
transcript |
TranscriptWorker | Video ID | Transcript text |
browser |
BrowserWorker | URL, actions | Automation results |
graph |
GraphWorker | Graph YAML, params | Graph execution result |
crawl¶
AI-powered web crawling with iterative batch processing.
Request Schema¶
{
"query": "AI regulation trends",
"max_urls": 50,
"max_depth": 3,
"geo": "us:ca",
"ai_model": "claude-3-5-sonnet-20241022",
"min_score": 6.0,
"search_provider": "duckduckgo"
}
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
query |
string | Yes | - | Research query |
max_urls |
int | No | 50 | Maximum URLs to process |
max_depth |
int | No | 3 | Maximum batch depth |
geo |
string | No | null | Geographic routing |
ai_model |
string | No | claude-3-5-sonnet | AI model for evaluation |
min_score |
float | No | 6.0 | Minimum relevance score |
search_provider |
string | No | duckduckgo | Search engine |
Response Schema¶
{
"job_id": "job_abc123",
"status": "COMPLETED",
"result": {
"total_urls": 42,
"urls_processed": 42,
"chunks_generated": 156,
"embeddings_stored": true,
"synthesis": "AI regulation is evolving rapidly...",
"report_url": "https://files.nominate.ai/..."
}
}
Example¶
curl -X POST https://intel.nominate.ai/api/v1/jobs/crawl \
-H "Content-Type: application/json" \
-d '{
"query": "What are the latest AI safety regulations?",
"geo": "us:ca",
"max_urls": 30
}'
lazarus¶
Historical archive retrieval from Internet Archive and Common Crawl.
Request Schema¶
{
"domain": "example.com",
"url": "https://example.com/page",
"from_date": "2020-01-01",
"to_date": "2024-01-01",
"sources": ["wayback", "commoncrawl"],
"limit": 100
}
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
domain |
string | No* | - | Domain to discover URLs |
url |
string | No* | - | Specific URL to retrieve |
from_date |
date | No | null | Start date |
to_date |
date | No | null | End date |
sources |
string[] | No | ["wayback"] | Archive sources |
limit |
int | No | 100 | Max results |
*Either domain or url is required.
Response Schema¶
{
"job_id": "job_def456",
"status": "COMPLETED",
"result": {
"urls_discovered": 1523,
"snapshots_retrieved": 87,
"date_range": {
"earliest": "2015-03-21",
"latest": "2024-01-10"
},
"output_url": "https://files.nominate.ai/..."
}
}
vectl¶
Vector embedding generation and semantic search.
Embed Request¶
{
"operation": "embed",
"texts": ["First document", "Second document"],
"model": "nomic-embed-text",
"store": "my-index"
}
Search Request¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
operation |
string | Yes | - | "embed" or "search" |
texts |
string[] | For embed | - | Texts to embed |
query |
string | For search | - | Search query |
store |
string | No | default | Vector store name |
top_k |
int | No | 10 | Number of results |
Response Schema¶
{
"job_id": "job_ghi789",
"status": "COMPLETED",
"result": {
"operation": "search",
"matches": [
{"id": "doc1", "score": 0.92, "text": "..."},
{"id": "doc2", "score": 0.87, "text": "..."}
]
}
}
screenshot¶
Browser screenshot capture.
Request Schema¶
{
"urls": ["https://example.com", "https://example.org"],
"full_page": true,
"format": "png",
"viewport_width": 1920,
"viewport_height": 1080,
"geo": "us:ca"
}
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
urls |
string[] | Yes | - | URLs to capture |
full_page |
bool | No | true | Full page capture |
format |
string | No | "png" | Image format |
viewport_width |
int | No | 1920 | Viewport width |
viewport_height |
int | No | 1080 | Viewport height |
geo |
string | No | null | Geographic routing |
Response Schema¶
{
"job_id": "job_jkl012",
"status": "COMPLETED",
"result": {
"urls_processed": 2,
"format": "png",
"screenshots": [
{
"url": "https://example.com",
"file_url": "https://files.nominate.ai/...",
"width": 1920,
"height": 3500
}
]
}
}
transcript¶
YouTube video transcript extraction.
Request Schema¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
video_id |
string | Yes | - | YouTube video ID |
language |
string | No | "en" | Transcript language |
include_timestamps |
bool | No | true | Include timestamps |
Response Schema¶
{
"job_id": "job_mno345",
"status": "COMPLETED",
"result": {
"video_id": "dQw4w9WgXcQ",
"title": "Video Title",
"duration_seconds": 212,
"transcript": [
{"start": 0.0, "text": "Never gonna give you up"},
{"start": 3.5, "text": "Never gonna let you down"}
],
"full_text": "Never gonna give you up..."
}
}
browser¶
Ferret browser automation.
Request Schema¶
{
"url": "https://example.com",
"actions": [
{"type": "fill", "selector": "input[name='q']", "value": "test"},
{"type": "click", "selector": "button[type='submit']"},
{"type": "wait_for_element", "selector": ".results"},
{"type": "extract_text", "selector": ".results"}
],
"timeout": 30000
}
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
url |
string | Yes | - | Starting URL |
actions |
object[] | Yes | - | Action sequence |
timeout |
int | No | 30000 | Timeout in ms |
geo |
string | No | null | Geographic routing |
Action Types¶
| Type | Parameters | Description |
|---|---|---|
navigate |
url |
Navigate to URL |
click |
selector |
Click element |
fill |
selector, value |
Fill input |
select |
selector, value |
Select option |
wait |
ms |
Wait duration |
wait_for_element |
selector |
Wait for element |
extract_text |
selector |
Extract text |
screenshot |
- | Take screenshot |
Response Schema¶
{
"job_id": "job_pqr678",
"status": "COMPLETED",
"result": {
"success": true,
"actions_executed": 4,
"extracted_data": {
"results": "Search results text..."
},
"final_url": "https://example.com/results"
}
}
graph¶
Research graph execution.
Request Schema¶
{
"graph": "name: research_pipeline\nstages:\n - name: discover\n ...",
"template": "basic_research",
"params": {
"query": "AI regulation",
"max_urls": 50
},
"workspace_id": "ws_xyz789"
}
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
graph |
string | No* | - | Inline YAML graph |
template |
string | No* | - | Template name |
params |
object | No | {} | Graph parameters |
workspace_id |
string | No | null | Workspace for artifacts |
*Either graph or template is required.
Response Schema¶
{
"job_id": "job_stu901",
"status": "COMPLETED",
"result": {
"graph_name": "research_pipeline",
"stages_completed": 5,
"stages_total": 5,
"duration_seconds": 245,
"outputs": {
"urls": [...],
"synthesis": "Research findings..."
},
"artifacts_url": "https://files.nominate.ai/..."
}
}
Submitting Jobs¶
REST API¶
# Crawl
curl -X POST https://intel.nominate.ai/api/v1/jobs/crawl -d '...'
# Lazarus
curl -X POST https://intel.nominate.ai/api/v1/jobs/lazarus -d '...'
# Vectl
curl -X POST https://intel.nominate.ai/api/v1/jobs/vectl -d '...'
# Screenshot
curl -X POST https://intel.nominate.ai/api/v1/jobs/screenshot -d '...'
# Transcript
curl -X POST https://intel.nominate.ai/api/v1/jobs/transcript -d '...'
# Browser
curl -X POST https://intel.nominate.ai/api/v1/jobs/browser -d '...'
# Graph
curl -X POST https://intel.nominate.ai/api/v1/jobs/graph -d '...'