Tor Gateway¶
The Tor Gateway provides anonymous web access through the Tor network. It enables both clearnet browsing via Tor exit nodes and dark web (.onion) content retrieval.
Overview¶
graph TB
subgraph "cbintel"
TOR_CLIENT[TorClient]
TOR_ROUTER[TorRouter]
end
subgraph "Tor Gateway API"
TOR_API[tor.nominate.ai]
WORKER_POOL[Tor Worker Pool]
end
subgraph "Tor Network"
ENTRY[Entry Nodes]
RELAY[Relay Nodes]
EXIT[Exit Nodes]
end
subgraph "Destinations"
CLEARNET[Clearnet Sites]
ONION[.onion Sites]
end
TOR_CLIENT --> TOR_API
TOR_ROUTER --> TOR_CLIENT
TOR_API --> WORKER_POOL
WORKER_POOL --> ENTRY --> RELAY --> EXIT
EXIT --> CLEARNET
RELAY --> ONION
Module Structure¶
src/cbintel/tor/
├── __init__.py # Public exports
├── models.py # Pydantic request/response models
├── client.py # TorClient async client
└── router.py # TorRouter for session management
TorClient¶
Low-level async client for the Tor Gateway API.
Basic Usage¶
from cbintel.tor import TorClient
async with TorClient() as tor:
# Simple fetch through Tor
result = await tor.fetch("https://check.torproject.org")
print(f"Via worker: {result.worker}")
print(f"Tor confirmed: {'Congratulations' in result.body}")
# Fetch .onion site
result = await tor.fetch("http://darksite.onion/")
if result.success:
print(result.body)
Tor-over-VPN¶
Route Tor Gateway API calls through a VPN for additional anonymity:
# Route through VPN SOCKS5 proxy
async with TorClient(upstream_proxy="socks5://vpn.local:1080") as tor:
result = await tor.fetch("http://example.onion")
# Traffic: You -> VPN -> tor.nominate.ai -> Tor -> .onion site
Or via environment variable:
Health Monitoring¶
async with TorClient() as tor:
# Check cluster health
health = await tor.get_health()
print(f"Healthy workers: {health.healthy_workers}/{health.total_workers}")
print(f"Mode: {health.mode}")
# Get worker status
workers = await tor.get_workers()
for worker in workers:
print(f"{worker.id}: {worker.status}")
TorRouter¶
Session-aware routing with health monitoring and circuit rotation.
Sticky Sessions¶
Maintain the same Tor circuit across multiple requests:
from cbintel.tor import TorRouter
async with TorRouter() as router:
async with router.session("darksite.onion") as session:
page1 = await session.fetch("/page1")
page2 = await session.fetch("/page2") # Same worker
page3 = await session.fetch("/page3") # Same worker
Circuit Rotation¶
Force a new Tor circuit when you need a fresh identity:
async with TorRouter() as router:
async with router.session("target.onion") as session:
# Crawl some pages
page1 = await session.fetch("/page1")
page2 = await session.fetch("/page2")
# Rotate to new circuit (new exit IP)
await session.rotate_circuit()
# Continue with fresh circuit
page3 = await session.fetch("/page3") # Different worker
# Rotate all active sessions
count = await router.rotate_all_circuits()
print(f"Rotated {count} circuits")
Router Status¶
async with TorRouter() as router:
status = await router.get_status()
print(f"Active sessions: {status['active_sessions']}")
Load Balancing Modes¶
| Mode | Description |
|---|---|
round_robin |
Distribute requests evenly across workers |
sticky |
Maintain session affinity using sticky_key |
random |
Random worker selection |
least_connections |
Route to worker with fewest active requests |
Setting Mode¶
Graph Operations¶
tor_fetch¶
Anonymous data acquisition in Research Graphs:
name: onion_crawler
stages:
- name: discover
parallel:
- op: tor_fetch
params:
url: "http://darksite.onion/links"
output: links_page
- name: crawl
parallel_foreach:
input: links_page
operations:
- op: tor_fetch
params:
mode: sticky
sticky_key: "darksite.onion"
output: pages
tor_screenshot¶
Capture screenshots anonymously through Tor SOCKS5 proxy:
name: onion_screenshots
stages:
- name: capture
parallel:
- op: tor_screenshot
params:
url: "http://darksite.onion"
full_page: true
timeout: 90000 # Higher timeout for Tor
output: screenshot
Parameters:
| Parameter | Default | Description |
|---|---|---|
url |
required | URL to capture (clearnet or .onion) |
tor_proxy |
socks5://127.0.0.1:9050 |
SOCKS5 proxy URL |
full_page |
true |
Capture full page |
timeout |
60000 |
Browser timeout in ms |
Combined VPN + TOR Pipeline¶
name: comprehensive_intel
stages:
- name: clearnet
parallel:
- op: fetch
params:
url: "https://target.com"
geo: "us:ca"
output: clearnet_data
- name: darkweb
parallel:
- op: tor_fetch
params:
url: "http://target.onion"
output: onion_data
- name: screenshots
parallel:
- op: tor_screenshot
params:
url: "http://target.onion"
output: onion_screenshot
- name: analyze
sequential:
- op: integrate
input: [clearnet_data, onion_data]
output: combined_intel
Configuration¶
Environment Variables¶
| Variable | Default | Description |
|---|---|---|
CBTOR_BASE_URL |
https://tor.nominate.ai |
Tor Gateway API URL |
CBTOR_TIMEOUT |
60.0 |
Default request timeout (seconds) |
CBTOR_MODE |
round_robin |
Default load balancing mode |
CBTOR_UPSTREAM_PROXY |
None |
Upstream proxy for Tor-over-VPN |
Request/Response Models¶
TorFetchRequest¶
class TorFetchRequest(BaseModel):
url: str
method: str = "GET"
headers: dict[str, str] = {}
body: str | None = None
timeout: float = 60.0
mode: str = "round_robin"
sticky_key: str | None = None
TorFetchResponse¶
class TorFetchResponse(BaseModel):
success: bool
status_code: int | None
headers: dict[str, str]
body: str
worker: str
circuit_id: str
latency_ms: float
error: str | None
TorHealthResponse¶
class TorHealthResponse(BaseModel):
total_workers: int
healthy_workers: int
mode: str
uptime_seconds: float
Best Practices¶
Timeouts¶
Tor is slower than clearnet. Adjust timeouts accordingly:
# Default 60s is usually sufficient
result = await tor.fetch(url, timeout=60.0)
# For .onion sites, consider longer
result = await tor.fetch(onion_url, timeout=120.0)
Sticky Sessions¶
Essential for multi-page crawls to maintain authentication state:
async with router.session("forum.onion") as session:
# Login
await session.fetch("/login", method="POST", body=credentials)
# Subsequent requests use same circuit (same session cookies)
await session.fetch("/dashboard")
await session.fetch("/profile")
Circuit Rotation¶
Use between batches to avoid rate limiting:
for batch in batches:
for url in batch:
await session.fetch(url)
# New identity for next batch
await session.rotate_circuit()
Rate Limiting¶
Add delays between requests:
import asyncio
for url in urls:
result = await tor.fetch(url)
await asyncio.sleep(2) # 2 second delay
Error Handling¶
from cbintel.tor import TorClient, TorError, TorTimeoutError
async with TorClient() as tor:
try:
result = await tor.fetch("http://example.onion")
except TorTimeoutError:
print("Request timed out - try increasing timeout")
except TorError as e:
print(f"Tor error: {e}")
Requirements¶
- tor_screenshot requires local Tor SOCKS5 proxy:
Security Notes¶
- Tor Gateway API currently has no authentication
- Don't log or store .onion URLs in cleartext
- Rotate circuits between sensitive operations
- Consider Tor-over-VPN for additional anonymity