The sync /v1/convert endpoint works well up to 20MB and 30s. Beyond that — large videos, 100-page PDFs, bulky PowerPoint files — you need the async jobs API. It decouples submission from completion, supports files up to 500MB, and plays well with serverless environments that have short request timeouts. This guide covers the two consumption patterns: polling and webhooks.
TL;DR — when to use /v1/jobs
Use the async jobs endpoint when:
- File is over 20MB (images, documents) or 100MB (video/audio)
- Expected conversion time exceeds 30s (long video transcode, 50+ page PDF)
- Your runtime has a request timeout (Lambda at 30s, Cloudflare Workers at 30s, most load balancers at 60s)
- You want webhook push instead of polling
Stick with /v1/convert for everything else — it's simpler and faster (no polling overhead).
Jobs API limits: 500MB max file size. Jobs expire after 24 hours. Download URLs are signed and expire after 1 hour — download promptly or re-poll to get a fresh URL.
Sync endpoint failure modes at large scale
The sync endpoint hits three walls with large files:
- Timeout: A 200MB MP4 → WebM conversion takes 90-180s. The API's max sync timeout is 120s. Lambda's max is 15 min — but API Gateway in front of Lambda cuts it to 30s. You get a 504 gateway timeout with no output.
- Memory pressure: Streaming a 200MB file through your application, holding the response in memory, and piping it to S3 — all at the same time — requires ~600MB RAM peak. On a 1GB Lambda function, this leaves little headroom.
- No visibility: A slow sync request is a black box. You don't know if it's 10% done or 90% done. If you kill it, you waste the conversion.
The async jobs API solves all three: you submit the file, disconnect, and poll (or receive a webhook) when ready.
Polling pattern: submit → poll → download
import asyncio
import os
from pathlib import Path
import httpx
API_KEY = os.environ['CTF_API_KEY']
JOBS_URL = 'https://changethisfile.com/v1/jobs'
HEADERS = {'Authorization': f'Bearer {API_KEY}'}
async def submit_job(client: httpx.AsyncClient, file_path: Path, target: str) -> str:
"""Submit a conversion job. Returns job_id."""
content = file_path.read_bytes()
resp = await client.post(
JOBS_URL,
headers=HEADERS,
content=content,
params={'target': target, 'filename': file_path.name},
timeout=60, # upload timeout only
)
resp.raise_for_status()
return resp.json()['job_id']
async def poll_job(
client: httpx.AsyncClient,
job_id: str,
poll_interval: float = 5.0,
max_wait: float = 600.0,
) -> dict:
"""Poll until job completes. Returns final status dict."""
waited = 0.0
while waited < max_wait:
resp = await client.get(
f'{JOBS_URL}/{job_id}',
headers=HEADERS,
timeout=10,
)
resp.raise_for_status()
status = resp.json()
if status['state'] == 'done':
return status
elif status['state'] == 'failed':
raise RuntimeError(f"Job {job_id} failed: {status.get('error', 'unknown')}")
elif status['state'] in ('queued', 'processing'):
# Optional: log progress
pct = status.get('progress_pct', '?')
print(f'Job {job_id}: {status["state"]} ({pct}%)')
await asyncio.sleep(poll_interval)
waited += poll_interval
raise TimeoutError(f'Job {job_id} did not complete within {max_wait}s')
async def download_result(client: httpx.AsyncClient, status: dict, out_path: Path):
"""Download the completed job output."""
dl_url = status['download_url'] # Pre-signed, expires in 1 hour
resp = await client.get(dl_url, timeout=300) # large files
resp.raise_for_status()
out_path.write_bytes(resp.content)
async def convert_large_file(file_path: Path, target: str) -> Path:
out_path = file_path.with_suffix(f'.{target}')
async with httpx.AsyncClient() as client:
job_id = await submit_job(client, file_path, target)
print(f'Submitted job {job_id}')
status = await poll_job(client, job_id)
await download_result(client, status, out_path)
print(f'Saved to {out_path}')
return out_path
if __name__ == '__main__':
import sys
path = Path(sys.argv[1])
target_fmt = sys.argv[2]
asyncio.run(convert_large_file(path, target_fmt))
For batches of large files, submit all jobs first, then poll them concurrently:
async def batch_large_files(files: list[Path], target: str):
async with httpx.AsyncClient() as client:
# Submit all jobs concurrently
job_ids = await asyncio.gather(
*[submit_job(client, f, target) for f in files]
)
print(f'Submitted {len(job_ids)} jobs')
# Poll all jobs concurrently
statuses = await asyncio.gather(
*[poll_job(client, jid) for jid in job_ids]
)
# Download all results concurrently
for file_path, status in zip(files, statuses):
out_path = file_path.with_suffix(f'.{target}')
await download_result(client, status, out_path)
Webhook pattern: no polling
If you control an HTTP server, register a webhook_url when submitting the job. The API POSTs the result to your URL when conversion completes — no polling required.
resp = await client.post(
JOBS_URL,
headers=HEADERS,
content=file_bytes,
params={
'target': 'mp4',
'filename': 'source.avi',
'webhook_url': 'https://your-server.com/webhooks/ctf',
'webhook_metadata': json.dumps({'user_id': user_id, 'file_id': file_id}),
},
)
The webhook POST body:
{
"job_id": "job_abc123",
"state": "done",
"download_url": "https://...",
"metadata": {"user_id": "u_123", "file_id": "f_456"},
"duration_ms": 12450
}
Validate the webhook signature (see the webhook guide) before processing. Download the file from download_url within 1 hour — signed URLs expire. See the webhook-based async conversion pattern guide for the full receiver implementation.
Pattern for serverless environments
Lambda, Cloudflare Workers, and similar runtimes have hard request timeouts (15 min Lambda, 30s CF Workers). The sync endpoint isn't viable for large files in these environments. The correct pattern:
- Receive the upload request from the user
- Save the file to S3/R2
- Submit a job to
/v1/jobswith awebhook_urlpointing at your handler function - Return a 202 Accepted +
job_idto the user - When the webhook fires, download the result from the signed URL and save back to storage
- Update your database and notify the user (email, push, poll endpoint)
// Cloudflare Worker — submit endpoint
export default {
async fetch(request, env) {
const formData = await request.formData();
const file = formData.get('file');
const target = formData.get('target');
// Submit async job
const resp = await fetch('https://changethisfile.com/v1/jobs', {
method: 'POST',
headers: {
Authorization: `Bearer ${env.CTF_API_KEY}`,
},
body: (() => {
const fd = new FormData();
fd.append('file', file);
fd.append('target', target);
fd.append('webhook_url', `${env.WORKER_URL}/webhook/ctf`);
return fd;
})(),
});
const { job_id } = await resp.json();
// Store job_id → user mapping in KV
await env.JOBS_KV.put(job_id, JSON.stringify({ userId: 'u_123', target }));
return Response.json({ job_id, status: 'processing' }, { status: 202 });
},
};
Error handling and timeouts
Jobs can fail for three reasons: invalid file (4xx), conversion engine error (5xx job failure), or platform timeout (job expires after 24h without completing). Handle each:
status = resp.json()
if status['state'] == 'failed':
error = status.get('error', {})
code = error.get('code', 'unknown')
if code == 'file_invalid':
# Bad file — skip and log
log_failure(job_id, 'corrupted_or_unsupported')
elif code == 'engine_error':
# Conversion engine crashed — retry once
new_job = await submit_job(client, file_path, target)
elif code == 'timeout':
# File took too long (very large video) — try lower quality target
log_failure(job_id, 'too_large_for_engine')
For jobs that simply expire (no activity for 24h), your polling loop will eventually raise TimeoutError. This happens when your poller crashes and doesn't resume. Add the job_id to a persistent store before polling so you can resume monitoring after restarts.
The async jobs API removes the hard walls that the sync endpoint hits: timeouts, memory pressure, and serverless function duration limits. For files that routinely exceed 20MB or 30s conversion time, the polling or webhook patterns here are the correct architecture. Free API keys include access to the jobs endpoint.