Idempotency is the property that running an operation multiple times produces the same result as running it once. In file conversion pipelines, it answers the question: "if my process crashes and I restart from scratch, will I pay for the same conversions twice?" With the right idempotency strategy, the answer is no. This guide covers the Idempotency-Key header, how to generate keys that don't collide, and how to layer application-level checkpointing on top for complete crash safety.
TL;DR — what Idempotency-Key actually does
When you include Idempotency-Key: in a POST /v1/convert request:
- The API processes the request normally on first call, caches the full response body + headers keyed on your value
- Any subsequent request with the same key (within 24 hours) returns the cached response immediately — same bytes, same headers, no re-conversion
- The cached replay does NOT consume a conversion from your monthly quota
This means: a retry after a network failure, a process crash that replays from a queue, or an accidental duplicate submission — all return the same result, free.
The 24-hour TTL means idempotency keys are not a permanent deduplication store. For long-running pipelines (multi-day batches), combine with application-level checkpointing.
Generating deterministic idempotency keys
The key must be deterministic: the same input file + target format must always produce the same key, so re-running a pipeline naturally deduplicates without consulting an external store.
What to include in the key material:
- File identity: file path (for local files), S3 key + version ID, or content hash
- File modification signal: mtime_ns or file size — so re-uploading a changed file generates a new key
- Target format: the same source file converted to jpg vs png should be two different idempotency keys
import hashlib
from pathlib import Path
def make_idempotency_key(file_path: Path, target_format: str) -> str:
"""Deterministic key that changes if the file changes."""
stat = file_path.stat()
# Include: resolved path, target, file size, modification time
payload = '|'.join([
str(file_path.resolve()),
target_format,
str(stat.st_size),
str(stat.st_mtime_ns),
])
# Truncate to 64 chars — API max is 128, keep it manageable
return hashlib.sha256(payload.encode()).hexdigest()[:64]
# Examples
print(make_idempotency_key(Path('photo.heic'), 'jpg'))
# -> 'a3f8c1d2e4b7f9a0...'
print(make_idempotency_key(Path('photo.heic'), 'png'))
# -> '7d2a4c8f1e3b9d0e...' (different — different target)
For S3-sourced files, use the ETag (MD5 of content) as the identity signal:
import boto3
def s3_idempotency_key(bucket: str, key: str, target: str) -> str:
s3 = boto3.client('s3')
head = s3.head_object(Bucket=bucket, Key=key)
etag = head['ETag'].strip('"') # S3 ETag is quoted MD5
payload = f"{bucket}/{key}|{etag}|{target}"
return hashlib.sha256(payload.encode()).hexdigest()[:64]
Using the header in Python, curl, and JavaScript
# Python (httpx)
resp = await client.post(
'https://changethisfile.com/v1/convert',
headers={
'Authorization': f'Bearer {API_KEY}',
'Idempotency-Key': make_idempotency_key(file_path, 'jpg'),
},
content=file_path.read_bytes(),
params={'target': 'jpg'},
)
# curl
IDEM_KEY=$(echo -n "${FILE_PATH}|jpg|$(stat -c%s $FILE_PATH)|$(stat -c%Y $FILE_PATH)" | sha256sum | cut -c1-64)
curl -X POST https://changethisfile.com/v1/convert \
-H "Authorization: Bearer $CTF_API_KEY" \
-H "Idempotency-Key: $IDEM_KEY" \
-F "file=@photo.heic" -F "target=jpg" \
-o photo.jpg
// JavaScript
async function getIdempotencyKey(file, targetFormat) {
const text = `${file.name}|${targetFormat}|${file.size}|${file.lastModified}`;
const buf = await crypto.subtle.digest('SHA-256', new TextEncoder().encode(text));
return Array.from(new Uint8Array(buf))
.map(b => b.toString(16).padStart(2, '0'))
.join('')
.slice(0, 64);
}
const formData = new FormData();
formData.append('file', file);
formData.append('target', 'jpg');
const resp = await fetch('https://changethisfile.com/v1/convert', {
method: 'POST',
headers: {
Authorization: `Bearer ${API_KEY}`,
'Idempotency-Key': await getIdempotencyKey(file, 'jpg'),
},
body: formData,
});
Layering API idempotency with application checkpointing
API-level idempotency (the header) protects against duplicate charges within 24 hours. Application-level checkpointing protects against wasted work across pipeline runs, even after the 24-hour TTL expires.
The two-layer approach:
import json
from pathlib import Path
CHECKPOINT_FILE = Path('.conversion_checkpoint.json')
def load_done() -> set:
if CHECKPOINT_FILE.exists():
return set(json.loads(CHECKPOINT_FILE.read_text()))
return set()
def save_done(done: set):
CHECKPOINT_FILE.write_text(json.dumps(sorted(done)))
async def convert_with_checkpoint(files, target):
done = load_done()
try:
async with httpx.AsyncClient() as client:
for path in files:
out_key = f"{path.stem}.{target}" # deterministic output ID
if out_key in done:
continue # Layer 2: skip already-converted
resp = await client.post(
'https://changethisfile.com/v1/convert',
headers={
'Authorization': f'Bearer {API_KEY}',
'Idempotency-Key': make_idempotency_key(path, target), # Layer 1: safe retry
},
content=path.read_bytes(),
params={'target': target},
)
resp.raise_for_status()
path.with_suffix(f'.{target}').write_bytes(resp.content)
done.add(out_key)
finally:
save_done(done) # Save progress even on crash/Ctrl+C
This means: if your process crashes, on restart it skips files in the checkpoint (Layer 2, free). If the checkpoint is lost but within 24h, the API idempotency key prevents double-billing (Layer 1, also free). Beyond 24h without a checkpoint, you re-convert — but that's a rare edge case for well-designed pipelines.
Key collision risks and how to avoid them
Three patterns that cause unexpected key collisions:
- Using filename only:
report.pdfin two different directories maps to the same key. Always include the full resolved path or a namespace prefix. - Using content hash without target: Converting the same PNG to both JPG and WebP needs two different keys. Always include the target format.
- Random keys that don't persist: Generating a new UUID per request defeats the purpose. The key must be stable across process restarts. Store it alongside your file metadata if you can't derive it deterministically.
# BAD: collides across directories
def bad_key(path):
return hashlib.sha256(path.name.encode()).hexdigest()[:32]
# GOOD: unique per resolved path + target
def good_key(path, target):
payload = f"{path.resolve()}|{target}|{path.stat().st_size}"
return hashlib.sha256(payload.encode()).hexdigest()[:64]
Detecting idempotency cache hits
The API returns an X-Idempotency-Replayed: true header on cached responses. Log this in your pipeline to track how many conversions are genuinely new vs. replayed:
resp = await client.post(...)
if resp.headers.get('X-Idempotency-Replayed') == 'true':
metrics['replayed'] += 1
else:
metrics['new_conversion'] += 1
A high replayed rate (>20%) suggests your checkpoint logic has a bug — you're re-submitting files you've already processed. Investigate why the checkpoint isn't being written or read correctly.
Idempotency is one of those properties that seems like an optimization until you have a pipeline crash at 3am and realize you just billed 5,000 duplicate conversions. The Idempotency-Key header costs nothing to add and gives you free retries within 24 hours. Combine it with a checkpoint file and your pipeline is crash-safe at any scale. Free API key to get started.