Geographic Routing¶
The GeoRouter provides intelligent geographic routing for HTTP requests, automatically selecting appropriate VPN banks or Tor based on the requested location.
Overview¶
graph TB
subgraph "Application"
HTTP[HTTP Request]
GEO_PARAM[geo parameter]
end
subgraph "GeoRouter"
PARSE[Parse Geo Filter]
SELECT[Select Route]
CACHE[Route Cache]
end
subgraph "Routes"
VPN[VPN Banks<br/>Geographic Proxies]
TOR[Tor Gateway<br/>Anonymous Access]
DIRECT[Direct<br/>No Proxy]
end
HTTP --> GEO_PARAM --> PARSE
PARSE --> SELECT
SELECT --> CACHE
CACHE --> VPN & TOR & DIRECT
GeoRouter¶
Basic Usage¶
from cbintel.geo import GeoRouter
router = GeoRouter()
# Get proxy for geographic region
proxy = await router.get_proxy("us:ca")
print(f"Using proxy: {proxy}") # http://17.0.0.1:8890
# Use in HTTP request
from cbintel.net import HTTPClient
async with HTTPClient() as client:
response = await client.get(url, proxy=proxy)
Route Types¶
# VPN route (geographic)
proxy = await router.get_proxy("us:ca")
# Tor route (anonymous)
proxy = await router.get_proxy("tor")
# Direct route (no proxy)
proxy = await router.get_proxy(None) # Returns None
Geographic Filters¶
Filter Format¶
| Component | Description | Examples |
|---|---|---|
country |
ISO country code | us, de, uk, fr |
state |
US state code (optional) | ca, ny, tx |
type |
Special routing (optional) | tor |
Common Filters¶
| Filter | Description | Route Type |
|---|---|---|
us |
United States | VPN |
us:ca |
California | VPN |
us:ny |
New York | VPN |
de |
Germany | VPN |
uk |
United Kingdom | VPN |
tor |
Tor network | Tor |
us:ca:tor |
California via Tor | VPN + Tor |
Integration Patterns¶
In HTTP Client¶
from cbintel.net import HTTPClient
from cbintel.geo import GeoRouter
router = GeoRouter()
async with HTTPClient() as client:
# Route through California
proxy = await router.get_proxy("us:ca")
response = await client.get("https://example.com", proxy=proxy)
In Graph Operations¶
stages:
- name: fetch_multi_geo
parallel:
- op: fetch
params:
url: "https://example.com"
geo: "us:ca"
output: ca_content
- op: fetch
params:
url: "https://example.com"
geo: "us:ny"
output: ny_content
- op: fetch
params:
url: "https://example.com"
geo: "de"
output: de_content
In Job Submission¶
curl -X POST https://intel.nominate.ai/api/v1/jobs/crawl \
-H "Content-Type: application/json" \
-d '{
"query": "local news trends",
"geo": "us:ca"
}'
Bank Selection¶
GeoRouter automatically selects VPN banks based on the geo filter.
Selection Logic¶
flowchart TD
START[Geo Filter] --> CHECK_TOR{Is 'tor'?}
CHECK_TOR -->|Yes| TOR[Return Tor Proxy]
CHECK_TOR -->|No| CHECK_BANK{Bank exists?}
CHECK_BANK -->|Yes| CHECK_HEALTH{Bank healthy?}
CHECK_HEALTH -->|Yes| RETURN[Return Bank Proxy]
CHECK_HEALTH -->|No| CREATE[Create Bank]
CHECK_BANK -->|No| CREATE
CREATE --> ASSIGN[Assign Workers]
ASSIGN --> START_VPN[Start VPNs]
START_VPN --> RETURN
Bank Creation¶
When a geo filter is requested but no matching bank exists:
# Request California proxy
proxy = await router.get_proxy("us:ca")
# GeoRouter checks for existing "us:ca" bank
# If none exists, creates one with available workers
# Returns bank endpoint: http://17.0.0.1:8890
Bank Caching¶
Banks are cached and reused:
# First request - may create bank
proxy1 = await router.get_proxy("us:ca")
# Second request - uses cached bank
proxy2 = await router.get_proxy("us:ca")
# Same endpoint
assert proxy1 == proxy2
Fallback Strategies¶
GeoRouter handles failures gracefully:
Bank Unhealthy¶
# If preferred bank is unhealthy
proxy = await router.get_proxy("us:ca", fallback=True)
# Router will:
# 1. Check if bank is healthy (workers up)
# 2. If unhealthy, try to recover
# 3. If recovery fails, fall back to:
# - Another bank with similar geo
# - Direct connection (if allowed)
No Workers Available¶
# If all workers are assigned to other banks
try:
proxy = await router.get_proxy("us:ca")
except NoWorkersAvailable:
# No workers free for new bank
# Options:
# 1. Wait for workers to free up
# 2. Use existing bank with different geo
# 3. Use direct connection
Health Checking¶
GeoRouter monitors bank health:
from cbintel.geo import GeoRouter
router = GeoRouter()
# Check if geo route is available
if await router.is_available("us:ca"):
proxy = await router.get_proxy("us:ca")
else:
print("California route unavailable")
# Get route status
status = await router.get_status("us:ca")
print(f"Workers up: {status['workers_up']}/{status['workers_total']}")
Configuration¶
Environment Variables¶
# Default geo filter (if none specified)
GEOROUTER_DEFAULT_GEO=us
# Fallback behavior
GEOROUTER_FALLBACK_ENABLED=true
GEOROUTER_FALLBACK_TO_DIRECT=false
# Health check interval
GEOROUTER_HEALTH_CHECK_INTERVAL=30
Router Options¶
router = GeoRouter(
default_geo="us", # Default if none specified
fallback_enabled=True, # Try fallbacks on failure
fallback_to_direct=False, # Don't allow direct as fallback
cache_ttl=300, # Cache routes for 5 minutes
)
Use Cases¶
Geographic Content Research¶
name: geographic_news_comparison
stages:
- name: fetch_regional
parallel:
# Fetch same URL from different regions
- op: fetch
params:
url: "https://news.example.com"
geo: "us:ca"
output: west_coast
- op: fetch
params:
url: "https://news.example.com"
geo: "us:ny"
output: east_coast
- op: fetch
params:
url: "https://news.example.com"
geo: "uk"
output: uk
- name: compare
sequential:
- op: compare
input: [west_coast, east_coast, uk]
params:
aspects: ["headlines", "coverage", "tone"]
output: comparison
Privacy-Focused Research¶
name: anonymous_research
stages:
- name: fetch_anonymous
parallel:
- op: tor_fetch
params:
url: "https://target.com"
output: tor_content
- op: fetch
params:
url: "https://target.com"
geo: "ch" # Switzerland
output: vpn_content
Multi-Region Verification¶
from cbintel.geo import GeoRouter
from cbintel.net import HTTPClient
router = GeoRouter()
regions = ["us:ca", "us:ny", "uk", "de", "jp"]
async with HTTPClient() as client:
results = {}
for region in regions:
proxy = await router.get_proxy(region)
response = await client.get("https://api.example.com/data", proxy=proxy)
results[region] = response.json()
# Compare results across regions
for region, data in results.items():
print(f"{region}: {data['version']}")
Best Practices¶
- Reuse routes: Let GeoRouter cache and reuse banks
- Handle failures: Always have fallback strategies
- Check availability: Verify route before critical operations
- Monitor health: Watch for degraded banks
- Limit concurrency: Don't overwhelm single banks
Troubleshooting¶
Route Not Available¶
- Check if workers are available:
GET /api/v1/workers/ - Check existing banks:
GET /api/v1/banks/ - Verify profile exists for filter
Slow Performance¶
- Check bank health
- Consider worker load
- Try different geographic region
Inconsistent Results¶
- Verify all workers using same VPN provider
- Check for circuit rotation
- Consider IP caching at destination