Rate limiting is not a bug — it's a contract. The API enforces per-key request rates to ensure fair throughput across all users. Hitting a 429 is expected in high-concurrency pipelines; handling it correctly is what separates a robust pipeline from a fragile one. This guide covers the full response cycle: parsing Retry-After, exponential backoff with jitter, proactive client-side throttling, and the right way to integrate these into asyncio worker pools.
TL;DR — the complete retry recipe
The minimum correct handling for a 429 response:
- Read
Retry-Afterheader (seconds) - Add jitter: sleep for
Retry-After + random(0, Retry-After * 0.5) - Retry the same request with the same Idempotency-Key
- If no Retry-After header, use exponential backoff: 2^attempt seconds, capped at 64s
ChangeThisFile rate limits by API key. Defaults: 60 requests/minute for Hobby, 300/min for Startup, 1,000/min for Scale. Running 10 concurrent workers at 2s each hits ~5 requests/second — well under all plan limits. Running 50 workers hits ~25 req/s, approaching Startup limits. Size your concurrency to your plan.
What bad retry code looks like
The three failure modes in naive retry implementations:
# BAD: ignores Retry-After, fixed sleep
if resp.status_code == 429:
time.sleep(1) # Too short — you'll 429 again immediately
continue
# BAD: exponential backoff without cap
if resp.status_code == 429:
time.sleep(2 ** attempt) # attempt=10: 1024s sleep
continue
# BAD: no jitter with multiple workers
if resp.status_code == 429:
time.sleep(60) # All workers sleep and wake simultaneously → thundering herd
continue
The thundering herd problem: when 10 workers all hit rate limits at the same time, they all sleep for the same duration and all retry at the same instant — immediately triggering another rate limit event. Jitter spreads their retry times and breaks the synchronization.
Correct retry implementation
import asyncio
import random
import time
from typing import Optional
MAX_RETRIES = 5
MAX_BACKOFF = 64.0 # Cap at 64s to avoid multi-minute pauses
def parse_retry_after(header_value: Optional[str]) -> float:
"""Parse Retry-After header. Returns seconds to wait."""
if not header_value:
return 60.0 # Default fallback
try:
# Numeric seconds format
return float(header_value)
except ValueError:
# HTTP-date format (rare)
from email.utils import parsedate_to_datetime
from datetime import timezone
retry_time = parsedate_to_datetime(header_value)
now = datetime.now(timezone.utc)
return max(0.0, (retry_time - now).total_seconds())
def backoff_with_jitter(attempt: int, retry_after: float) -> float:
"""Full jitter: uniform random between 0 and the suggested wait time."""
# Base: max of Retry-After header and exponential backoff
base = max(retry_after, min(2 ** attempt, MAX_BACKOFF))
# Full jitter: random in [0, base] — spreads worker retries over the window
return random.uniform(0, base)
async def convert_with_retry(
client,
file_bytes: bytes,
target: str,
api_key: str,
idempotency_key: str,
) -> bytes:
for attempt in range(MAX_RETRIES):
resp = await client.post(
'https://changethisfile.com/v1/convert',
headers={
'Authorization': f'Bearer {api_key}',
'Idempotency-Key': idempotency_key,
},
content=file_bytes,
params={'target': target},
timeout=120,
)
if resp.status_code == 429:
retry_after = parse_retry_after(resp.headers.get('Retry-After'))
sleep_time = backoff_with_jitter(attempt, retry_after)
print(f'Rate limited (attempt {attempt+1}), sleeping {sleep_time:.1f}s')
await asyncio.sleep(sleep_time)
continue
if resp.status_code >= 500:
sleep_time = min(2 ** attempt, MAX_BACKOFF)
await asyncio.sleep(sleep_time)
continue
if resp.status_code == 422:
raise ValueError(f'Unprocessable: {resp.text}')
resp.raise_for_status()
return resp.content
raise RuntimeError(f'Max retries ({MAX_RETRIES}) exceeded')
Proactive throttling: client-side token bucket
Reactive retry (respond to 429) is correct but wasteful — you've already consumed a request slot and gotten nothing back. Proactive throttling (never exceed your rate limit) is more efficient at scale.
import asyncio
import time
class TokenBucket:
"""Leaky token bucket for proactive rate limiting."""
def __init__(self, rate: float, burst: int = None):
"""
rate: tokens per second (requests per second)
burst: max tokens to accumulate (defaults to rate * 2)
"""
self.rate = rate
self.burst = burst or int(rate * 2)
self._tokens = float(self.burst)
self._last_refill = time.monotonic()
self._lock = asyncio.Lock()
async def acquire(self, tokens: float = 1.0):
async with self._lock:
now = time.monotonic()
elapsed = now - self._last_refill
# Refill tokens based on elapsed time
self._tokens = min(
self.burst,
self._tokens + elapsed * self.rate
)
self._last_refill = now
if self._tokens >= tokens:
self._tokens -= tokens
return # Token available, proceed immediately
# Not enough tokens — calculate wait time
wait = (tokens - self._tokens) / self.rate
await asyncio.sleep(wait)
await self.acquire(tokens) # Recursive — recheck after sleeping
# Usage: 4 requests/second (Hobby: 60/min = 1/s, use 0.9 for safety)
# Scale plan: 1000/min → 16/s
throttle = TokenBucket(rate=4.0, burst=10)
async def convert_throttled(client, file_bytes, target, api_key, idem_key):
await throttle.acquire()
return await convert_with_retry(client, file_bytes, target, api_key, idem_key)
Tune rate to slightly below your plan's per-minute limit divided by 60. Leave 10% headroom: Hobby (60/min) → 0.9 req/s. Startup (300/min) → 4.5 req/s. This eliminates most 429 responses while maximizing throughput.
Shell: retry with curl
#!/bin/bash
# retry_convert.sh — curl with Retry-After-aware retry
convert_file() {
local file="$1" target="$2" out="$3"
local max_attempts=5
for attempt in $(seq 1 $max_attempts); do
response=$(curl -s -w "\n%{http_code}" \
-X POST https://changethisfile.com/v1/convert \
-H "Authorization: Bearer $CTF_API_KEY" \
-F "file=@$file" -F "target=$target" \
-o "$out")
http_code=$(echo "$response" | tail -1)
if [ "$http_code" = "200" ]; then
echo "Converted: $file"
return 0
elif [ "$http_code" = "429" ]; then
# Fetch Retry-After header
retry_after=$(curl -s -I \
-H "Authorization: Bearer $CTF_API_KEY" \
https://changethisfile.com/v1/convert 2>/dev/null | \
grep -i 'Retry-After' | awk '{print $2}')
retry_after=${retry_after:-60}
jitter=$((RANDOM % retry_after))
echo "Rate limited. Sleeping $((retry_after + jitter))s (attempt $attempt)"
sleep $((retry_after + jitter))
else
echo "Error $http_code on attempt $attempt"
sleep $((2 ** attempt))
fi
done
echo "FAILED: $file after $max_attempts attempts" >&2
return 1
}
# Batch with gnu parallel, 5 concurrent
export -f convert_file
export CTF_API_KEY
ls *.pdf | parallel -j 5 convert_file {} jpg {.}.jpg
Observing your rate limit headroom
The API returns rate limit state in response headers on every request:
resp = await client.post(...)
# Rate limit headers
limit = resp.headers.get('X-RateLimit-Limit') # Requests allowed per minute
remaining = resp.headers.get('X-RateLimit-Remaining') # Remaining in this window
reset = resp.headers.get('X-RateLimit-Reset') # Unix timestamp when window resets
if remaining and int(remaining) < 10:
print(f'WARNING: {remaining} requests remaining in rate limit window')
Track the headroom across your workers. If remaining consistently drops to 0, you need to either reduce concurrency, upgrade your plan, or implement tighter client-side throttling. A remaining=0 followed by a 429 is expected and handled. A sustained pattern of 429s means your concurrency exceeds your plan's limit.
Rate limit handling done correctly means 429s are rare recoverable events, not pipeline-killing failures. Token bucket throttling plus jittered retry covers the full range from light concurrent usage to near-limit throughput. Free tier is a good place to stress-test your retry logic before moving to a paid plan.