Skip to content

Tor Gateway

The Tor Gateway provides anonymous web access through the Tor network. It enables both clearnet browsing via Tor exit nodes and dark web (.onion) content retrieval.

Overview

graph TB
    subgraph "cbintel"
        TOR_CLIENT[TorClient]
        TOR_ROUTER[TorRouter]
    end

    subgraph "Tor Gateway API"
        TOR_API[tor.nominate.ai]
        WORKER_POOL[Tor Worker Pool]
    end

    subgraph "Tor Network"
        ENTRY[Entry Nodes]
        RELAY[Relay Nodes]
        EXIT[Exit Nodes]
    end

    subgraph "Destinations"
        CLEARNET[Clearnet Sites]
        ONION[.onion Sites]
    end

    TOR_CLIENT --> TOR_API
    TOR_ROUTER --> TOR_CLIENT
    TOR_API --> WORKER_POOL
    WORKER_POOL --> ENTRY --> RELAY --> EXIT
    EXIT --> CLEARNET
    RELAY --> ONION

Module Structure

src/cbintel/tor/
├── __init__.py     # Public exports
├── models.py       # Pydantic request/response models
├── client.py       # TorClient async client
└── router.py       # TorRouter for session management

TorClient

Low-level async client for the Tor Gateway API.

Basic Usage

from cbintel.tor import TorClient

async with TorClient() as tor:
    # Simple fetch through Tor
    result = await tor.fetch("https://check.torproject.org")
    print(f"Via worker: {result.worker}")
    print(f"Tor confirmed: {'Congratulations' in result.body}")

    # Fetch .onion site
    result = await tor.fetch("http://darksite.onion/")
    if result.success:
        print(result.body)

Tor-over-VPN

Route Tor Gateway API calls through a VPN for additional anonymity:

# Route through VPN SOCKS5 proxy
async with TorClient(upstream_proxy="socks5://vpn.local:1080") as tor:
    result = await tor.fetch("http://example.onion")
    # Traffic: You -> VPN -> tor.nominate.ai -> Tor -> .onion site

Or via environment variable:

export CBTOR_UPSTREAM_PROXY=socks5://vpn.local:1080

Health Monitoring

async with TorClient() as tor:
    # Check cluster health
    health = await tor.get_health()
    print(f"Healthy workers: {health.healthy_workers}/{health.total_workers}")
    print(f"Mode: {health.mode}")

    # Get worker status
    workers = await tor.get_workers()
    for worker in workers:
        print(f"{worker.id}: {worker.status}")

TorRouter

Session-aware routing with health monitoring and circuit rotation.

Sticky Sessions

Maintain the same Tor circuit across multiple requests:

from cbintel.tor import TorRouter

async with TorRouter() as router:
    async with router.session("darksite.onion") as session:
        page1 = await session.fetch("/page1")
        page2 = await session.fetch("/page2")  # Same worker
        page3 = await session.fetch("/page3")  # Same worker

Circuit Rotation

Force a new Tor circuit when you need a fresh identity:

async with TorRouter() as router:
    async with router.session("target.onion") as session:
        # Crawl some pages
        page1 = await session.fetch("/page1")
        page2 = await session.fetch("/page2")

        # Rotate to new circuit (new exit IP)
        await session.rotate_circuit()

        # Continue with fresh circuit
        page3 = await session.fetch("/page3")  # Different worker

    # Rotate all active sessions
    count = await router.rotate_all_circuits()
    print(f"Rotated {count} circuits")

Router Status

async with TorRouter() as router:
    status = await router.get_status()
    print(f"Active sessions: {status['active_sessions']}")

Load Balancing Modes

Mode Description
round_robin Distribute requests evenly across workers
sticky Maintain session affinity using sticky_key
random Random worker selection
least_connections Route to worker with fewest active requests

Setting Mode

async with TorClient() as tor:
    await tor.set_mode("sticky")

Graph Operations

tor_fetch

Anonymous data acquisition in Research Graphs:

name: onion_crawler
stages:
  - name: discover
    parallel:
      - op: tor_fetch
        params:
          url: "http://darksite.onion/links"
        output: links_page

  - name: crawl
    parallel_foreach:
      input: links_page
      operations:
        - op: tor_fetch
          params:
            mode: sticky
            sticky_key: "darksite.onion"
          output: pages

tor_screenshot

Capture screenshots anonymously through Tor SOCKS5 proxy:

name: onion_screenshots
stages:
  - name: capture
    parallel:
      - op: tor_screenshot
        params:
          url: "http://darksite.onion"
          full_page: true
          timeout: 90000  # Higher timeout for Tor
        output: screenshot

Parameters:

Parameter Default Description
url required URL to capture (clearnet or .onion)
tor_proxy socks5://127.0.0.1:9050 SOCKS5 proxy URL
full_page true Capture full page
timeout 60000 Browser timeout in ms

Combined VPN + TOR Pipeline

name: comprehensive_intel
stages:
  - name: clearnet
    parallel:
      - op: fetch
        params:
          url: "https://target.com"
          geo: "us:ca"
        output: clearnet_data

  - name: darkweb
    parallel:
      - op: tor_fetch
        params:
          url: "http://target.onion"
        output: onion_data

  - name: screenshots
    parallel:
      - op: tor_screenshot
        params:
          url: "http://target.onion"
        output: onion_screenshot

  - name: analyze
    sequential:
      - op: integrate
        input: [clearnet_data, onion_data]
        output: combined_intel

Configuration

Environment Variables

Variable Default Description
CBTOR_BASE_URL https://tor.nominate.ai Tor Gateway API URL
CBTOR_TIMEOUT 60.0 Default request timeout (seconds)
CBTOR_MODE round_robin Default load balancing mode
CBTOR_UPSTREAM_PROXY None Upstream proxy for Tor-over-VPN

Request/Response Models

TorFetchRequest

class TorFetchRequest(BaseModel):
    url: str
    method: str = "GET"
    headers: dict[str, str] = {}
    body: str | None = None
    timeout: float = 60.0
    mode: str = "round_robin"
    sticky_key: str | None = None

TorFetchResponse

class TorFetchResponse(BaseModel):
    success: bool
    status_code: int | None
    headers: dict[str, str]
    body: str
    worker: str
    circuit_id: str
    latency_ms: float
    error: str | None

TorHealthResponse

class TorHealthResponse(BaseModel):
    total_workers: int
    healthy_workers: int
    mode: str
    uptime_seconds: float

Best Practices

Timeouts

Tor is slower than clearnet. Adjust timeouts accordingly:

# Default 60s is usually sufficient
result = await tor.fetch(url, timeout=60.0)

# For .onion sites, consider longer
result = await tor.fetch(onion_url, timeout=120.0)

Sticky Sessions

Essential for multi-page crawls to maintain authentication state:

async with router.session("forum.onion") as session:
    # Login
    await session.fetch("/login", method="POST", body=credentials)

    # Subsequent requests use same circuit (same session cookies)
    await session.fetch("/dashboard")
    await session.fetch("/profile")

Circuit Rotation

Use between batches to avoid rate limiting:

for batch in batches:
    for url in batch:
        await session.fetch(url)

    # New identity for next batch
    await session.rotate_circuit()

Rate Limiting

Add delays between requests:

import asyncio

for url in urls:
    result = await tor.fetch(url)
    await asyncio.sleep(2)  # 2 second delay

Error Handling

from cbintel.tor import TorClient, TorError, TorTimeoutError

async with TorClient() as tor:
    try:
        result = await tor.fetch("http://example.onion")
    except TorTimeoutError:
        print("Request timed out - try increasing timeout")
    except TorError as e:
        print(f"Tor error: {e}")

Requirements

  • tor_screenshot requires local Tor SOCKS5 proxy:
    # Install Tor
    sudo apt install tor
    
    # Start Tor service
    sudo systemctl start tor
    
    # Verify SOCKS5 proxy
    curl --socks5 127.0.0.1:9050 https://check.torproject.org
    

Security Notes

  • Tor Gateway API currently has no authentication
  • Don't log or store .onion URLs in cleartext
  • Rotate circuits between sensitive operations
  • Consider Tor-over-VPN for additional anonymity