Tor Gateway¶

The Tor Gateway provides anonymous web access through the Tor network. It enables both clearnet browsing via Tor exit nodes and dark web (.onion) content retrieval.

Overview¶

graph TB
    subgraph "cbintel"
        TOR_CLIENT[TorClient]
        TOR_ROUTER[TorRouter]
    end

    subgraph "Tor Gateway API"
        TOR_API[tor.nominate.ai]
        WORKER_POOL[Tor Worker Pool]
    end

    subgraph "Tor Network"
        ENTRY[Entry Nodes]
        RELAY[Relay Nodes]
        EXIT[Exit Nodes]
    end

    subgraph "Destinations"
        CLEARNET[Clearnet Sites]
        ONION[.onion Sites]
    end

    TOR_CLIENT --> TOR_API
    TOR_ROUTER --> TOR_CLIENT
    TOR_API --> WORKER_POOL
    WORKER_POOL --> ENTRY --> RELAY --> EXIT
    EXIT --> CLEARNET
    RELAY --> ONION

Module Structure¶

src/cbintel/tor/
├── __init__.py     # Public exports
├── models.py       # Pydantic request/response models
├── client.py       # TorClient async client
└── router.py       # TorRouter for session management

TorClient¶

Low-level async client for the Tor Gateway API.

Basic Usage¶

from cbintel.tor import TorClient

async with TorClient() as tor:
    # Simple fetch through Tor
    result = await tor.fetch("https://check.torproject.org")
    print(f"Via worker: {result.worker}")
    print(f"Tor confirmed: {'Congratulations' in result.body}")

    # Fetch .onion site
    result = await tor.fetch("http://darksite.onion/")
    if result.success:
        print(result.body)

Tor-over-VPN¶

Route Tor Gateway API calls through a VPN for additional anonymity:

# Route through VPN SOCKS5 proxy
async with TorClient(upstream_proxy="socks5://vpn.local:1080") as tor:
    result = await tor.fetch("http://example.onion")
    # Traffic: You -> VPN -> tor.nominate.ai -> Tor -> .onion site

Or via environment variable:

export CBTOR_UPSTREAM_PROXY=socks5://vpn.local:1080

Health Monitoring¶

async with TorClient() as tor:
    # Check cluster health
    health = await tor.get_health()
    print(f"Healthy workers: {health.healthy_workers}/{health.total_workers}")
    print(f"Mode: {health.mode}")

    # Get worker status
    workers = await tor.get_workers()
    for worker in workers:
        print(f"{worker.id}: {worker.status}")

TorRouter¶

Session-aware routing with health monitoring and circuit rotation.

Sticky Sessions¶

Maintain the same Tor circuit across multiple requests:

from cbintel.tor import TorRouter

async with TorRouter() as router:
    async with router.session("darksite.onion") as session:
        page1 = await session.fetch("/page1")
        page2 = await session.fetch("/page2")  # Same worker
        page3 = await session.fetch("/page3")  # Same worker

Circuit Rotation¶

Force a new Tor circuit when you need a fresh identity:

async with TorRouter() as router:
    async with router.session("target.onion") as session:
        # Crawl some pages
        page1 = await session.fetch("/page1")
        page2 = await session.fetch("/page2")

        # Rotate to new circuit (new exit IP)
        await session.rotate_circuit()

        # Continue with fresh circuit
        page3 = await session.fetch("/page3")  # Different worker

    # Rotate all active sessions
    count = await router.rotate_all_circuits()
    print(f"Rotated {count} circuits")

Router Status¶

async with TorRouter() as router:
    status = await router.get_status()
    print(f"Active sessions: {status['active_sessions']}")

Load Balancing Modes¶

Mode	Description
`round_robin`	Distribute requests evenly across workers
`sticky`	Maintain session affinity using `sticky_key`
`random`	Random worker selection
`least_connections`	Route to worker with fewest active requests

Setting Mode¶

async with TorClient() as tor:
    await tor.set_mode("sticky")

Graph Operations¶

tor_fetch¶

Anonymous data acquisition in Research Graphs:

name: onion_crawler
stages:
  - name: discover
    parallel:
      - op: tor_fetch
        params:
          url: "http://darksite.onion/links"
        output: links_page

  - name: crawl
    parallel_foreach:
      input: links_page
      operations:
        - op: tor_fetch
          params:
            mode: sticky
            sticky_key: "darksite.onion"
          output: pages

tor_screenshot¶

Capture screenshots anonymously through Tor SOCKS5 proxy:

name: onion_screenshots
stages:
  - name: capture
    parallel:
      - op: tor_screenshot
        params:
          url: "http://darksite.onion"
          full_page: true
          timeout: 90000  # Higher timeout for Tor
        output: screenshot

Parameters:

Parameter	Default	Description
`url`	required	URL to capture (clearnet or .onion)
`tor_proxy`	`socks5://127.0.0.1:9050`	SOCKS5 proxy URL
`full_page`	`true`	Capture full page
`timeout`	`60000`	Browser timeout in ms

Combined VPN + TOR Pipeline¶

name: comprehensive_intel
stages:
  - name: clearnet
    parallel:
      - op: fetch
        params:
          url: "https://target.com"
          geo: "us:ca"
        output: clearnet_data

  - name: darkweb
    parallel:
      - op: tor_fetch
        params:
          url: "http://target.onion"
        output: onion_data

  - name: screenshots
    parallel:
      - op: tor_screenshot
        params:
          url: "http://target.onion"
        output: onion_screenshot

  - name: analyze
    sequential:
      - op: integrate
        input: [clearnet_data, onion_data]
        output: combined_intel

Configuration¶

Environment Variables¶

Variable	Default	Description
`CBTOR_BASE_URL`	`https://tor.nominate.ai`	Tor Gateway API URL
`CBTOR_TIMEOUT`	`60.0`	Default request timeout (seconds)
`CBTOR_MODE`	`round_robin`	Default load balancing mode
`CBTOR_UPSTREAM_PROXY`	`None`	Upstream proxy for Tor-over-VPN

Request/Response Models¶

TorFetchRequest¶

class TorFetchRequest(BaseModel):
    url: str
    method: str = "GET"
    headers: dict[str, str] = {}
    body: str | None = None
    timeout: float = 60.0
    mode: str = "round_robin"
    sticky_key: str | None = None

TorFetchResponse¶

class TorFetchResponse(BaseModel):
    success: bool
    status_code: int | None
    headers: dict[str, str]
    body: str
    worker: str
    circuit_id: str
    latency_ms: float
    error: str | None

TorHealthResponse¶

class TorHealthResponse(BaseModel):
    total_workers: int
    healthy_workers: int
    mode: str
    uptime_seconds: float

Best Practices¶

Timeouts¶

Tor is slower than clearnet. Adjust timeouts accordingly:

# Default 60s is usually sufficient
result = await tor.fetch(url, timeout=60.0)

# For .onion sites, consider longer
result = await tor.fetch(onion_url, timeout=120.0)

Sticky Sessions¶

Essential for multi-page crawls to maintain authentication state:

async with router.session("forum.onion") as session:
    # Login
    await session.fetch("/login", method="POST", body=credentials)

    # Subsequent requests use same circuit (same session cookies)
    await session.fetch("/dashboard")
    await session.fetch("/profile")

Circuit Rotation¶

Use between batches to avoid rate limiting:

for batch in batches:
    for url in batch:
        await session.fetch(url)

    # New identity for next batch
    await session.rotate_circuit()

Rate Limiting¶

Add delays between requests:

import asyncio

for url in urls:
    result = await tor.fetch(url)
    await asyncio.sleep(2)  # 2 second delay

Error Handling¶

from cbintel.tor import TorClient, TorError, TorTimeoutError

async with TorClient() as tor:
    try:
        result = await tor.fetch("http://example.onion")
    except TorTimeoutError:
        print("Request timed out - try increasing timeout")
    except TorError as e:
        print(f"Tor error: {e}")

Requirements¶

tor_screenshot requires local Tor SOCKS5 proxy:

# Install Tor
sudo apt install tor

# Start Tor service
sudo systemctl start tor

# Verify SOCKS5 proxy
curl --socks5 127.0.0.1:9050 https://check.torproject.org

Security Notes¶

Tor Gateway API currently has no authentication
Don't log or store .onion URLs in cleartext
Rotate circuits between sensitive operations
Consider Tor-over-VPN for additional anonymity