Rate limiting is not a bug — it's a contract. The API enforces per-key request rates to ensure fair throughput across all users. Hitting a 429 is expected in high-concurrency pipelines; handling it correctly is what separates a robust pipeline from a fragile one. This guide covers the full response cycle: parsing Retry-After, exponential backoff with jitter, proactive client-side throttling, and the right way to integrate these into asyncio worker pools.

TL;DR — the complete retry recipe

The minimum correct handling for a 429 response:

  1. Read Retry-After header (seconds)
  2. Add jitter: sleep for Retry-After + random(0, Retry-After * 0.5)
  3. Retry the same request with the same Idempotency-Key
  4. If no Retry-After header, use exponential backoff: 2^attempt seconds, capped at 64s

ChangeThisFile rate limits by API key. Defaults: 60 requests/minute for Hobby, 300/min for Startup, 1,000/min for Scale. Running 10 concurrent workers at 2s each hits ~5 requests/second — well under all plan limits. Running 50 workers hits ~25 req/s, approaching Startup limits. Size your concurrency to your plan.

What bad retry code looks like

The three failure modes in naive retry implementations:

# BAD: ignores Retry-After, fixed sleep
if resp.status_code == 429:
    time.sleep(1)  # Too short — you'll 429 again immediately
    continue

# BAD: exponential backoff without cap
if resp.status_code == 429:
    time.sleep(2 ** attempt)  # attempt=10: 1024s sleep
    continue

# BAD: no jitter with multiple workers
if resp.status_code == 429:
    time.sleep(60)  # All workers sleep and wake simultaneously → thundering herd
    continue

The thundering herd problem: when 10 workers all hit rate limits at the same time, they all sleep for the same duration and all retry at the same instant — immediately triggering another rate limit event. Jitter spreads their retry times and breaks the synchronization.

Correct retry implementation

import asyncio
import random
import time
from typing import Optional

MAX_RETRIES = 5
MAX_BACKOFF = 64.0  # Cap at 64s to avoid multi-minute pauses


def parse_retry_after(header_value: Optional[str]) -> float:
    """Parse Retry-After header. Returns seconds to wait."""
    if not header_value:
        return 60.0  # Default fallback
    try:
        # Numeric seconds format
        return float(header_value)
    except ValueError:
        # HTTP-date format (rare)
        from email.utils import parsedate_to_datetime
        from datetime import timezone
        retry_time = parsedate_to_datetime(header_value)
        now = datetime.now(timezone.utc)
        return max(0.0, (retry_time - now).total_seconds())


def backoff_with_jitter(attempt: int, retry_after: float) -> float:
    """Full jitter: uniform random between 0 and the suggested wait time."""
    # Base: max of Retry-After header and exponential backoff
    base = max(retry_after, min(2 ** attempt, MAX_BACKOFF))
    # Full jitter: random in [0, base] — spreads worker retries over the window
    return random.uniform(0, base)


async def convert_with_retry(
    client,
    file_bytes: bytes,
    target: str,
    api_key: str,
    idempotency_key: str,
) -> bytes:
    for attempt in range(MAX_RETRIES):
        resp = await client.post(
            'https://changethisfile.com/v1/convert',
            headers={
                'Authorization': f'Bearer {api_key}',
                'Idempotency-Key': idempotency_key,
            },
            content=file_bytes,
            params={'target': target},
            timeout=120,
        )

        if resp.status_code == 429:
            retry_after = parse_retry_after(resp.headers.get('Retry-After'))
            sleep_time = backoff_with_jitter(attempt, retry_after)
            print(f'Rate limited (attempt {attempt+1}), sleeping {sleep_time:.1f}s')
            await asyncio.sleep(sleep_time)
            continue

        if resp.status_code >= 500:
            sleep_time = min(2 ** attempt, MAX_BACKOFF)
            await asyncio.sleep(sleep_time)
            continue

        if resp.status_code == 422:
            raise ValueError(f'Unprocessable: {resp.text}')

        resp.raise_for_status()
        return resp.content

    raise RuntimeError(f'Max retries ({MAX_RETRIES}) exceeded')

Proactive throttling: client-side token bucket

Reactive retry (respond to 429) is correct but wasteful — you've already consumed a request slot and gotten nothing back. Proactive throttling (never exceed your rate limit) is more efficient at scale.

import asyncio
import time


class TokenBucket:
    """Leaky token bucket for proactive rate limiting."""

    def __init__(self, rate: float, burst: int = None):
        """
        rate: tokens per second (requests per second)
        burst: max tokens to accumulate (defaults to rate * 2)
        """
        self.rate = rate
        self.burst = burst or int(rate * 2)
        self._tokens = float(self.burst)
        self._last_refill = time.monotonic()
        self._lock = asyncio.Lock()

    async def acquire(self, tokens: float = 1.0):
        async with self._lock:
            now = time.monotonic()
            elapsed = now - self._last_refill
            # Refill tokens based on elapsed time
            self._tokens = min(
                self.burst,
                self._tokens + elapsed * self.rate
            )
            self._last_refill = now

            if self._tokens >= tokens:
                self._tokens -= tokens
                return  # Token available, proceed immediately

            # Not enough tokens — calculate wait time
            wait = (tokens - self._tokens) / self.rate

        await asyncio.sleep(wait)
        await self.acquire(tokens)  # Recursive — recheck after sleeping


# Usage: 4 requests/second (Hobby: 60/min = 1/s, use 0.9 for safety)
# Scale plan: 1000/min → 16/s
throttle = TokenBucket(rate=4.0, burst=10)

async def convert_throttled(client, file_bytes, target, api_key, idem_key):
    await throttle.acquire()
    return await convert_with_retry(client, file_bytes, target, api_key, idem_key)

Tune rate to slightly below your plan's per-minute limit divided by 60. Leave 10% headroom: Hobby (60/min) → 0.9 req/s. Startup (300/min) → 4.5 req/s. This eliminates most 429 responses while maximizing throughput.

Shell: retry with curl

#!/bin/bash
# retry_convert.sh — curl with Retry-After-aware retry

convert_file() {
    local file="$1" target="$2" out="$3"
    local max_attempts=5

    for attempt in $(seq 1 $max_attempts); do
        response=$(curl -s -w "\n%{http_code}" \
            -X POST https://changethisfile.com/v1/convert \
            -H "Authorization: Bearer $CTF_API_KEY" \
            -F "file=@$file" -F "target=$target" \
            -o "$out")

        http_code=$(echo "$response" | tail -1)

        if [ "$http_code" = "200" ]; then
            echo "Converted: $file"
            return 0
        elif [ "$http_code" = "429" ]; then
            # Fetch Retry-After header
            retry_after=$(curl -s -I \
                -H "Authorization: Bearer $CTF_API_KEY" \
                https://changethisfile.com/v1/convert 2>/dev/null | \
                grep -i 'Retry-After' | awk '{print $2}')
            retry_after=${retry_after:-60}
            jitter=$((RANDOM % retry_after))
            echo "Rate limited. Sleeping $((retry_after + jitter))s (attempt $attempt)"
            sleep $((retry_after + jitter))
        else
            echo "Error $http_code on attempt $attempt"
            sleep $((2 ** attempt))
        fi
    done
    echo "FAILED: $file after $max_attempts attempts" >&2
    return 1
}

# Batch with gnu parallel, 5 concurrent
export -f convert_file
export CTF_API_KEY
ls *.pdf | parallel -j 5 convert_file {} jpg {.}.jpg

Observing your rate limit headroom

The API returns rate limit state in response headers on every request:

resp = await client.post(...)

# Rate limit headers
limit = resp.headers.get('X-RateLimit-Limit')      # Requests allowed per minute
remaining = resp.headers.get('X-RateLimit-Remaining') # Remaining in this window
reset = resp.headers.get('X-RateLimit-Reset')       # Unix timestamp when window resets

if remaining and int(remaining) < 10:
    print(f'WARNING: {remaining} requests remaining in rate limit window')

Track the headroom across your workers. If remaining consistently drops to 0, you need to either reduce concurrency, upgrade your plan, or implement tighter client-side throttling. A remaining=0 followed by a 429 is expected and handled. A sustained pattern of 429s means your concurrency exceeds your plan's limit.

Rate limit handling done correctly means 429s are rare recoverable events, not pipeline-killing failures. Token bucket throttling plus jittered retry covers the full range from light concurrent usage to near-limit throughput. Free tier is a good place to stress-test your retry logic before moving to a paid plan.