TOR Load Balancer API - Implementation Plan¶
Overview¶
API layer that provides load-balanced access to TOR workers for both clearnet and .onion requests, with automatic failure recovery.
Architecture¶
┌─────────────────────────────────────────────────────────────┐
│ API Gateway │
│ POST /fetch { url, mode?, sticky_key? } │
└─────────────────────┬───────────────────────────────────────┘
│
┌─────────────────────▼───────────────────────────────────────┐
│ Load Balancer │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Mode Select │ │ Health Mgr │ │ Worker Pool │ │
│ │ - sticky │ │ - state │ │ [w1, w2, w3, w4] │ │
│ │ - round_rob │ │ - recovery │ │ │ │
│ │ - random │ │ - circuit │ │ │ │
│ │ - least_con │ │ breaker │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
└─────────────────────┬───────────────────────────────────────┘
│
┌─────────────┼─────────────┬─────────────┐
▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ Worker1 │ │ Worker2 │ │ Worker3 │ │ Worker4 │
│ HEALTHY │ │ HEALTHY │ │ SUSPECT │ │ DEAD │
│ cons: 2 │ │ cons: 1 │ │ cons: 0 │ │ cons: 0 │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
Load Balancing Modes¶
| Mode | Description | Use Case |
|---|---|---|
round_robin |
Rotate through healthy workers | Default, even distribution |
sticky |
Hash-based affinity (by key or IP) | Session consistency |
random |
Random healthy worker selection | Simple, no state |
least_connections |
Pick worker with fewest active | High-throughput |
failover |
Primary with fallback chain | High availability |
Worker Health State Machine¶
┌──────────────────────────────────┐
│ │
▼ │
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌───────┴─┐
│ HEALTHY │───▶│ SUSPECT │───▶│ DEAD │───▶│RECOVERY │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
▲ │ │ │
│ │ │ │
└──────────────┴──────────────┴──────────────┘
(on success)
States¶
| State | Description | Behavior |
|---|---|---|
HEALTHY |
Worker operational | Receives traffic |
SUSPECT |
Recent failure(s) | Receives traffic, monitored closely |
DEAD |
Failed health checks | No traffic, enters recovery |
RECOVERY |
Attempting reconnect | Backoff retry, no traffic |
Transitions¶
HEALTH_CONFIG = {
"failure_threshold": 3, # failures before SUSPECT -> DEAD
"success_threshold": 2, # successes to return to HEALTHY
"suspect_window": 30, # seconds in SUSPECT before DEAD
"recovery_backoff": [1, 2, 4, 8, 16, 32, 60], # seconds
"health_check_interval": 10, # seconds between checks
}
Recovery State Machine¶
RECOVERY State:
Entry:
- backoff_index = 0
- last_attempt = now()
Loop:
- Wait backoff[backoff_index] seconds
- Attempt reconnect sequence:
1. Ping worker (SSH reachable?)
2. Check TOR SOCKS (port 9050 responding?)
3. Test TOR circuit (check.torproject.org)
4. Test .onion resolution (optional)
- On SUCCESS:
- Transition to HEALTHY
- Reset backoff_index
- On FAILURE:
- backoff_index = min(backoff_index + 1, max_index)
- If backoff_index == max_index:
- Alert: "Worker requires manual intervention"
- Retry
API Endpoints¶
POST /api/v1/fetch¶
Fetch URL through TOR worker pool.
{
"url": "https://example.com" | "http://xxx.onion",
"mode": "round_robin" | "sticky" | "random" | "least_connections",
"sticky_key": "user-123", // optional, for sticky mode
"timeout": 30, // optional, seconds
"follow_redirects": true, // optional
"headers": { "User-Agent": "..." } // optional
}
Response:
{
"success": true,
"worker": "worker-01",
"status_code": 200,
"headers": { ... },
"body": "...",
"timing": {
"dns_ms": 50,
"connect_ms": 200,
"total_ms": 1500
}
}
GET /api/v1/health¶
Pool health status.
{
"healthy_workers": 3,
"total_workers": 4,
"workers": {
"worker-01": { "state": "HEALTHY", "connections": 2, "success_rate": 0.98 },
"worker-02": { "state": "HEALTHY", "connections": 1, "success_rate": 0.95 },
"worker-03": { "state": "SUSPECT", "connections": 0, "failures": 2 },
"worker-04": { "state": "RECOVERY", "backoff_index": 3, "next_retry": "10s" }
}
}
POST /api/v1/workers/{id}/drain¶
Gracefully remove worker from pool.
POST /api/v1/workers/{id}/enable¶
Force worker back into pool.
Implementation Phases¶
Phase 1: Core Load Balancer¶
- Worker pool manager with health states
- Round-robin and random selection
- Basic
/fetchendpoint - Route: clearnet → TransPort, .onion → SOCKS5
Phase 2: Health Management¶
- Health check loop (background task)
- State machine transitions
- Recovery with exponential backoff
- Tunnel reconnect integration
Phase 3: Advanced Features¶
- Sticky sessions (hash ring)
- Least-connections tracking
- Metrics and timing
- Admin endpoints (drain/enable)
Phase 4: Resilience¶
- Circuit breaker pattern
- Request retry with fallback
- Alert integration
- Graceful degradation
Data Structures¶
@dataclass
class Worker:
id: str # "worker-01"
ip: str # "17.0.0.134"
socks_port: int = 9050
wg_interface: str # "wg-tor01"
wg_host_ip: str # "10.201.1.1"
state: WorkerState = HEALTHY
active_connections: int = 0
total_requests: int = 0
total_failures: int = 0
last_success: datetime = None
last_failure: datetime = None
# Recovery state
backoff_index: int = 0
recovery_attempts: int = 0
class WorkerState(Enum):
HEALTHY = "healthy"
SUSPECT = "suspect"
DEAD = "dead"
RECOVERY = "recovery"
class LoadBalanceMode(Enum):
ROUND_ROBIN = "round_robin"
STICKY = "sticky"
RANDOM = "random"
LEAST_CONNECTIONS = "least_connections"
FAILOVER = "failover"
Worker Configuration¶
WORKERS = [
Worker(id="worker-01", ip="17.0.0.134", wg_interface="wg-tor01", wg_host_ip="10.201.1.1"),
Worker(id="worker-02", ip="17.0.0.202", wg_interface="wg-tor02", wg_host_ip="10.201.2.1"),
Worker(id="worker-03", ip="17.0.0.201", wg_interface="wg-tor03", wg_host_ip="10.201.3.1"),
Worker(id="worker-04", ip="17.0.0.120", wg_interface="wg-tor04", wg_host_ip="10.201.4.1"),
]
Request Flow¶
1. Request arrives: POST /fetch { url: "http://xxx.onion" }
2. Detect URL type:
- .onion → use SOCKS5
- clearnet → use TransPort (wg interface)
3. Select worker:
- Filter: state in [HEALTHY, SUSPECT]
- Apply mode algorithm (round_robin, etc.)
- Increment active_connections
4. Execute request:
- .onion: curl --socks5-hostname {worker.ip}:9050 {url}
- clearnet: curl --interface {worker.wg_host_ip} {url}
5. Handle result:
- Success: record_success(worker), decrement connections
- Failure: record_failure(worker), maybe transition state
6. Return response to caller
Files to Create¶
api/
├── load_balancer/
│ ├── __init__.py
│ ├── pool.py # WorkerPool, Worker dataclass
│ ├── health.py # HealthManager, state machine
│ ├── modes.py # Selection algorithms
│ ├── recovery.py # Recovery loop, backoff
│ └── fetch.py # Request execution
├── routers/
│ └── fetch.py # /api/v1/fetch endpoint
└── main.py # Add new router