Skip to content

TOR Load Balancer API - Implementation Plan

Overview

API layer that provides load-balanced access to TOR workers for both clearnet and .onion requests, with automatic failure recovery.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      API Gateway                             │
│  POST /fetch { url, mode?, sticky_key? }                    │
└─────────────────────┬───────────────────────────────────────┘
┌─────────────────────▼───────────────────────────────────────┐
│                   Load Balancer                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │ Mode Select │  │ Health Mgr  │  │ Worker Pool         │  │
│  │ - sticky    │  │ - state     │  │ [w1, w2, w3, w4]   │  │
│  │ - round_rob │  │ - recovery  │  │                     │  │
│  │ - random    │  │ - circuit   │  │                     │  │
│  │ - least_con │  │   breaker   │  │                     │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
└─────────────────────┬───────────────────────────────────────┘
        ┌─────────────┼─────────────┬─────────────┐
        ▼             ▼             ▼             ▼
   ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐
   │ Worker1 │  │ Worker2 │  │ Worker3 │  │ Worker4 │
   │ HEALTHY │  │ HEALTHY │  │ SUSPECT │  │  DEAD   │
   │ cons: 2 │  │ cons: 1 │  │ cons: 0 │  │ cons: 0 │
   └─────────┘  └─────────┘  └─────────┘  └─────────┘

Load Balancing Modes

Mode Description Use Case
round_robin Rotate through healthy workers Default, even distribution
sticky Hash-based affinity (by key or IP) Session consistency
random Random healthy worker selection Simple, no state
least_connections Pick worker with fewest active High-throughput
failover Primary with fallback chain High availability

Worker Health State Machine

                    ┌──────────────────────────────────┐
                    │                                  │
                    ▼                                  │
┌─────────┐    ┌─────────┐    ┌─────────┐    ┌───────┴─┐
│ HEALTHY │───▶│ SUSPECT │───▶│  DEAD   │───▶│RECOVERY │
└─────────┘    └─────────┘    └─────────┘    └─────────┘
     ▲              │              │              │
     │              │              │              │
     └──────────────┴──────────────┴──────────────┘
                 (on success)

States

State Description Behavior
HEALTHY Worker operational Receives traffic
SUSPECT Recent failure(s) Receives traffic, monitored closely
DEAD Failed health checks No traffic, enters recovery
RECOVERY Attempting reconnect Backoff retry, no traffic

Transitions

HEALTH_CONFIG = {
    "failure_threshold": 3,        # failures before SUSPECT -> DEAD
    "success_threshold": 2,        # successes to return to HEALTHY
    "suspect_window": 30,          # seconds in SUSPECT before DEAD
    "recovery_backoff": [1, 2, 4, 8, 16, 32, 60],  # seconds
    "health_check_interval": 10,   # seconds between checks
}

Recovery State Machine

RECOVERY State:

  Entry:
    - backoff_index = 0
    - last_attempt = now()

  Loop:
    - Wait backoff[backoff_index] seconds
    - Attempt reconnect sequence:
        1. Ping worker (SSH reachable?)
        2. Check TOR SOCKS (port 9050 responding?)
        3. Test TOR circuit (check.torproject.org)
        4. Test .onion resolution (optional)
    - On SUCCESS:
        - Transition to HEALTHY
        - Reset backoff_index
    - On FAILURE:
        - backoff_index = min(backoff_index + 1, max_index)
        - If backoff_index == max_index:
            - Alert: "Worker requires manual intervention"
        - Retry

API Endpoints

POST /api/v1/fetch

Fetch URL through TOR worker pool.

{
  "url": "https://example.com" | "http://xxx.onion",
  "mode": "round_robin" | "sticky" | "random" | "least_connections",
  "sticky_key": "user-123",           // optional, for sticky mode
  "timeout": 30,                       // optional, seconds
  "follow_redirects": true,            // optional
  "headers": { "User-Agent": "..." }   // optional
}

Response:

{
  "success": true,
  "worker": "worker-01",
  "status_code": 200,
  "headers": { ... },
  "body": "...",
  "timing": {
    "dns_ms": 50,
    "connect_ms": 200,
    "total_ms": 1500
  }
}

GET /api/v1/health

Pool health status.

{
  "healthy_workers": 3,
  "total_workers": 4,
  "workers": {
    "worker-01": { "state": "HEALTHY", "connections": 2, "success_rate": 0.98 },
    "worker-02": { "state": "HEALTHY", "connections": 1, "success_rate": 0.95 },
    "worker-03": { "state": "SUSPECT", "connections": 0, "failures": 2 },
    "worker-04": { "state": "RECOVERY", "backoff_index": 3, "next_retry": "10s" }
  }
}

POST /api/v1/workers/{id}/drain

Gracefully remove worker from pool.

POST /api/v1/workers/{id}/enable

Force worker back into pool.

Implementation Phases

Phase 1: Core Load Balancer

  • Worker pool manager with health states
  • Round-robin and random selection
  • Basic /fetch endpoint
  • Route: clearnet → TransPort, .onion → SOCKS5

Phase 2: Health Management

  • Health check loop (background task)
  • State machine transitions
  • Recovery with exponential backoff
  • Tunnel reconnect integration

Phase 3: Advanced Features

  • Sticky sessions (hash ring)
  • Least-connections tracking
  • Metrics and timing
  • Admin endpoints (drain/enable)

Phase 4: Resilience

  • Circuit breaker pattern
  • Request retry with fallback
  • Alert integration
  • Graceful degradation

Data Structures

@dataclass
class Worker:
    id: str                          # "worker-01"
    ip: str                          # "17.0.0.134"
    socks_port: int = 9050
    wg_interface: str                # "wg-tor01"
    wg_host_ip: str                  # "10.201.1.1"

    state: WorkerState = HEALTHY
    active_connections: int = 0
    total_requests: int = 0
    total_failures: int = 0
    last_success: datetime = None
    last_failure: datetime = None

    # Recovery state
    backoff_index: int = 0
    recovery_attempts: int = 0

class WorkerState(Enum):
    HEALTHY = "healthy"
    SUSPECT = "suspect"
    DEAD = "dead"
    RECOVERY = "recovery"

class LoadBalanceMode(Enum):
    ROUND_ROBIN = "round_robin"
    STICKY = "sticky"
    RANDOM = "random"
    LEAST_CONNECTIONS = "least_connections"
    FAILOVER = "failover"

Worker Configuration

WORKERS = [
    Worker(id="worker-01", ip="17.0.0.134", wg_interface="wg-tor01", wg_host_ip="10.201.1.1"),
    Worker(id="worker-02", ip="17.0.0.202", wg_interface="wg-tor02", wg_host_ip="10.201.2.1"),
    Worker(id="worker-03", ip="17.0.0.201", wg_interface="wg-tor03", wg_host_ip="10.201.3.1"),
    Worker(id="worker-04", ip="17.0.0.120", wg_interface="wg-tor04", wg_host_ip="10.201.4.1"),
]

Request Flow

1. Request arrives: POST /fetch { url: "http://xxx.onion" }

2. Detect URL type:
   - .onion → use SOCKS5
   - clearnet → use TransPort (wg interface)

3. Select worker:
   - Filter: state in [HEALTHY, SUSPECT]
   - Apply mode algorithm (round_robin, etc.)
   - Increment active_connections

4. Execute request:
   - .onion: curl --socks5-hostname {worker.ip}:9050 {url}
   - clearnet: curl --interface {worker.wg_host_ip} {url}

5. Handle result:
   - Success: record_success(worker), decrement connections
   - Failure: record_failure(worker), maybe transition state

6. Return response to caller

Files to Create

api/
├── load_balancer/
│   ├── __init__.py
│   ├── pool.py           # WorkerPool, Worker dataclass
│   ├── health.py         # HealthManager, state machine
│   ├── modes.py          # Selection algorithms
│   ├── recovery.py       # Recovery loop, backoff
│   └── fetch.py          # Request execution
├── routers/
│   └── fetch.py          # /api/v1/fetch endpoint
└── main.py               # Add new router