TOR Load Balancer API - Implementation Plan¶

Overview¶

API layer that provides load-balanced access to TOR workers for both clearnet and .onion requests, with automatic failure recovery.

Architecture¶

┌─────────────────────────────────────────────────────────────┐
│                      API Gateway                             │
│  POST /fetch { url, mode?, sticky_key? }                    │
└─────────────────────┬───────────────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────────────┐
│                   Load Balancer                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │ Mode Select │  │ Health Mgr  │  │ Worker Pool         │  │
│  │ - sticky    │  │ - state     │  │ [w1, w2, w3, w4]   │  │
│  │ - round_rob │  │ - recovery  │  │                     │  │
│  │ - random    │  │ - circuit   │  │                     │  │
│  │ - least_con │  │   breaker   │  │                     │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
└─────────────────────┬───────────────────────────────────────┘
                      │
        ┌─────────────┼─────────────┬─────────────┐
        ▼             ▼             ▼             ▼
   ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐
   │ Worker1 │  │ Worker2 │  │ Worker3 │  │ Worker4 │
   │ HEALTHY │  │ HEALTHY │  │ SUSPECT │  │  DEAD   │
   │ cons: 2 │  │ cons: 1 │  │ cons: 0 │  │ cons: 0 │
   └─────────┘  └─────────┘  └─────────┘  └─────────┘

Load Balancing Modes¶

Mode	Description	Use Case
`round_robin`	Rotate through healthy workers	Default, even distribution
`sticky`	Hash-based affinity (by key or IP)	Session consistency
`random`	Random healthy worker selection	Simple, no state
`least_connections`	Pick worker with fewest active	High-throughput
`failover`	Primary with fallback chain	High availability

Worker Health State Machine¶

                    ┌──────────────────────────────────┐
                    │                                  │
                    ▼                                  │
┌─────────┐    ┌─────────┐    ┌─────────┐    ┌───────┴─┐
│ HEALTHY │───▶│ SUSPECT │───▶│  DEAD   │───▶│RECOVERY │
└─────────┘    └─────────┘    └─────────┘    └─────────┘
     ▲              │              │              │
     │              │              │              │
     └──────────────┴──────────────┴──────────────┘
                 (on success)

States¶

State	Description	Behavior
`HEALTHY`	Worker operational	Receives traffic
`SUSPECT`	Recent failure(s)	Receives traffic, monitored closely
`DEAD`	Failed health checks	No traffic, enters recovery
`RECOVERY`	Attempting reconnect	Backoff retry, no traffic

Transitions¶

HEALTH_CONFIG = {
    "failure_threshold": 3,        # failures before SUSPECT -> DEAD
    "success_threshold": 2,        # successes to return to HEALTHY
    "suspect_window": 30,          # seconds in SUSPECT before DEAD
    "recovery_backoff": [1, 2, 4, 8, 16, 32, 60],  # seconds
    "health_check_interval": 10,   # seconds between checks
}

Recovery State Machine¶

RECOVERY State:

  Entry:
    - backoff_index = 0
    - last_attempt = now()

  Loop:
    - Wait backoff[backoff_index] seconds
    - Attempt reconnect sequence:
        1. Ping worker (SSH reachable?)
        2. Check TOR SOCKS (port 9050 responding?)
        3. Test TOR circuit (check.torproject.org)
        4. Test .onion resolution (optional)
    - On SUCCESS:
        - Transition to HEALTHY
        - Reset backoff_index
    - On FAILURE:
        - backoff_index = min(backoff_index + 1, max_index)
        - If backoff_index == max_index:
            - Alert: "Worker requires manual intervention"
        - Retry

API Endpoints¶

`POST /api/v1/fetch`¶

Fetch URL through TOR worker pool.

{
  "url": "https://example.com" | "http://xxx.onion",
  "mode": "round_robin" | "sticky" | "random" | "least_connections",
  "sticky_key": "user-123",           // optional, for sticky mode
  "timeout": 30,                       // optional, seconds
  "follow_redirects": true,            // optional
  "headers": { "User-Agent": "..." }   // optional
}

Response:

{
  "success": true,
  "worker": "worker-01",
  "status_code": 200,
  "headers": { ... },
  "body": "...",
  "timing": {
    "dns_ms": 50,
    "connect_ms": 200,
    "total_ms": 1500
  }
}

`GET /api/v1/health`¶

Pool health status.

{
  "healthy_workers": 3,
  "total_workers": 4,
  "workers": {
    "worker-01": { "state": "HEALTHY", "connections": 2, "success_rate": 0.98 },
    "worker-02": { "state": "HEALTHY", "connections": 1, "success_rate": 0.95 },
    "worker-03": { "state": "SUSPECT", "connections": 0, "failures": 2 },
    "worker-04": { "state": "RECOVERY", "backoff_index": 3, "next_retry": "10s" }
  }
}

`POST /api/v1/workers/{id}/drain`¶

Gracefully remove worker from pool.

`POST /api/v1/workers/{id}/enable`¶

Force worker back into pool.

Implementation Phases¶

Phase 1: Core Load Balancer¶

Worker pool manager with health states
Round-robin and random selection
Basic /fetch endpoint
Route: clearnet → TransPort, .onion → SOCKS5

Phase 2: Health Management¶

Health check loop (background task)
State machine transitions
Recovery with exponential backoff
Tunnel reconnect integration

Phase 3: Advanced Features¶

Sticky sessions (hash ring)
Least-connections tracking
Metrics and timing
Admin endpoints (drain/enable)

Phase 4: Resilience¶

Circuit breaker pattern
Request retry with fallback
Alert integration
Graceful degradation

Data Structures¶

@dataclass
class Worker:
    id: str                          # "worker-01"
    ip: str                          # "17.0.0.134"
    socks_port: int = 9050
    wg_interface: str                # "wg-tor01"
    wg_host_ip: str                  # "10.201.1.1"

    state: WorkerState = HEALTHY
    active_connections: int = 0
    total_requests: int = 0
    total_failures: int = 0
    last_success: datetime = None
    last_failure: datetime = None

    # Recovery state
    backoff_index: int = 0
    recovery_attempts: int = 0

class WorkerState(Enum):
    HEALTHY = "healthy"
    SUSPECT = "suspect"
    DEAD = "dead"
    RECOVERY = "recovery"

class LoadBalanceMode(Enum):
    ROUND_ROBIN = "round_robin"
    STICKY = "sticky"
    RANDOM = "random"
    LEAST_CONNECTIONS = "least_connections"
    FAILOVER = "failover"

Worker Configuration¶

WORKERS = [
    Worker(id="worker-01", ip="17.0.0.134", wg_interface="wg-tor01", wg_host_ip="10.201.1.1"),
    Worker(id="worker-02", ip="17.0.0.202", wg_interface="wg-tor02", wg_host_ip="10.201.2.1"),
    Worker(id="worker-03", ip="17.0.0.201", wg_interface="wg-tor03", wg_host_ip="10.201.3.1"),
    Worker(id="worker-04", ip="17.0.0.120", wg_interface="wg-tor04", wg_host_ip="10.201.4.1"),
]

Request Flow¶

1. Request arrives: POST /fetch { url: "http://xxx.onion" }

2. Detect URL type:
   - .onion → use SOCKS5
   - clearnet → use TransPort (wg interface)

3. Select worker:
   - Filter: state in [HEALTHY, SUSPECT]
   - Apply mode algorithm (round_robin, etc.)
   - Increment active_connections

4. Execute request:
   - .onion: curl --socks5-hostname {worker.ip}:9050 {url}
   - clearnet: curl --interface {worker.wg_host_ip} {url}

5. Handle result:
   - Success: record_success(worker), decrement connections
   - Failure: record_failure(worker), maybe transition state

6. Return response to caller

Files to Create¶

api/
├── load_balancer/
│   ├── __init__.py
│   ├── pool.py           # WorkerPool, Worker dataclass
│   ├── health.py         # HealthManager, state machine
│   ├── modes.py          # Selection algorithms
│   ├── recovery.py       # Recovery loop, backoff
│   └── fetch.py          # Request execution
├── routers/
│   └── fetch.py          # /api/v1/fetch endpoint
└── main.py               # Add new router