Idempotency is the property that running an operation multiple times produces the same result as running it once. In file conversion pipelines, it answers the question: "if my process crashes and I restart from scratch, will I pay for the same conversions twice?" With the right idempotency strategy, the answer is no. This guide covers the Idempotency-Key header, how to generate keys that don't collide, and how to layer application-level checkpointing on top for complete crash safety.

TL;DR — what Idempotency-Key actually does

When you include Idempotency-Key: in a POST /v1/convert request:

  1. The API processes the request normally on first call, caches the full response body + headers keyed on your value
  2. Any subsequent request with the same key (within 24 hours) returns the cached response immediately — same bytes, same headers, no re-conversion
  3. The cached replay does NOT consume a conversion from your monthly quota

This means: a retry after a network failure, a process crash that replays from a queue, or an accidental duplicate submission — all return the same result, free.

The 24-hour TTL means idempotency keys are not a permanent deduplication store. For long-running pipelines (multi-day batches), combine with application-level checkpointing.

Generating deterministic idempotency keys

The key must be deterministic: the same input file + target format must always produce the same key, so re-running a pipeline naturally deduplicates without consulting an external store.

What to include in the key material:

  • File identity: file path (for local files), S3 key + version ID, or content hash
  • File modification signal: mtime_ns or file size — so re-uploading a changed file generates a new key
  • Target format: the same source file converted to jpg vs png should be two different idempotency keys
import hashlib
from pathlib import Path

def make_idempotency_key(file_path: Path, target_format: str) -> str:
    """Deterministic key that changes if the file changes."""
    stat = file_path.stat()
    # Include: resolved path, target, file size, modification time
    payload = '|'.join([
        str(file_path.resolve()),
        target_format,
        str(stat.st_size),
        str(stat.st_mtime_ns),
    ])
    # Truncate to 64 chars — API max is 128, keep it manageable
    return hashlib.sha256(payload.encode()).hexdigest()[:64]

# Examples
print(make_idempotency_key(Path('photo.heic'), 'jpg'))
# -> 'a3f8c1d2e4b7f9a0...'

print(make_idempotency_key(Path('photo.heic'), 'png'))
# -> '7d2a4c8f1e3b9d0e...'  (different — different target)

For S3-sourced files, use the ETag (MD5 of content) as the identity signal:

import boto3

def s3_idempotency_key(bucket: str, key: str, target: str) -> str:
    s3 = boto3.client('s3')
    head = s3.head_object(Bucket=bucket, Key=key)
    etag = head['ETag'].strip('"')  # S3 ETag is quoted MD5
    payload = f"{bucket}/{key}|{etag}|{target}"
    return hashlib.sha256(payload.encode()).hexdigest()[:64]

Using the header in Python, curl, and JavaScript

# Python (httpx)
resp = await client.post(
    'https://changethisfile.com/v1/convert',
    headers={
        'Authorization': f'Bearer {API_KEY}',
        'Idempotency-Key': make_idempotency_key(file_path, 'jpg'),
    },
    content=file_path.read_bytes(),
    params={'target': 'jpg'},
)
# curl
IDEM_KEY=$(echo -n "${FILE_PATH}|jpg|$(stat -c%s $FILE_PATH)|$(stat -c%Y $FILE_PATH)" | sha256sum | cut -c1-64)

curl -X POST https://changethisfile.com/v1/convert \
  -H "Authorization: Bearer $CTF_API_KEY" \
  -H "Idempotency-Key: $IDEM_KEY" \
  -F "file=@photo.heic" -F "target=jpg" \
  -o photo.jpg
// JavaScript
async function getIdempotencyKey(file, targetFormat) {
  const text = `${file.name}|${targetFormat}|${file.size}|${file.lastModified}`;
  const buf = await crypto.subtle.digest('SHA-256', new TextEncoder().encode(text));
  return Array.from(new Uint8Array(buf))
    .map(b => b.toString(16).padStart(2, '0'))
    .join('')
    .slice(0, 64);
}

const formData = new FormData();
formData.append('file', file);
formData.append('target', 'jpg');

const resp = await fetch('https://changethisfile.com/v1/convert', {
  method: 'POST',
  headers: {
    Authorization: `Bearer ${API_KEY}`,
    'Idempotency-Key': await getIdempotencyKey(file, 'jpg'),
  },
  body: formData,
});

Layering API idempotency with application checkpointing

API-level idempotency (the header) protects against duplicate charges within 24 hours. Application-level checkpointing protects against wasted work across pipeline runs, even after the 24-hour TTL expires.

The two-layer approach:

import json
from pathlib import Path

CHECKPOINT_FILE = Path('.conversion_checkpoint.json')

def load_done() -> set:
    if CHECKPOINT_FILE.exists():
        return set(json.loads(CHECKPOINT_FILE.read_text()))
    return set()

def save_done(done: set):
    CHECKPOINT_FILE.write_text(json.dumps(sorted(done)))

async def convert_with_checkpoint(files, target):
    done = load_done()
    try:
        async with httpx.AsyncClient() as client:
            for path in files:
                out_key = f"{path.stem}.{target}"  # deterministic output ID
                if out_key in done:
                    continue  # Layer 2: skip already-converted

                resp = await client.post(
                    'https://changethisfile.com/v1/convert',
                    headers={
                        'Authorization': f'Bearer {API_KEY}',
                        'Idempotency-Key': make_idempotency_key(path, target),  # Layer 1: safe retry
                    },
                    content=path.read_bytes(),
                    params={'target': target},
                )
                resp.raise_for_status()
                path.with_suffix(f'.{target}').write_bytes(resp.content)
                done.add(out_key)
    finally:
        save_done(done)  # Save progress even on crash/Ctrl+C

This means: if your process crashes, on restart it skips files in the checkpoint (Layer 2, free). If the checkpoint is lost but within 24h, the API idempotency key prevents double-billing (Layer 1, also free). Beyond 24h without a checkpoint, you re-convert — but that's a rare edge case for well-designed pipelines.

Key collision risks and how to avoid them

Three patterns that cause unexpected key collisions:

  1. Using filename only: report.pdf in two different directories maps to the same key. Always include the full resolved path or a namespace prefix.
  2. Using content hash without target: Converting the same PNG to both JPG and WebP needs two different keys. Always include the target format.
  3. Random keys that don't persist: Generating a new UUID per request defeats the purpose. The key must be stable across process restarts. Store it alongside your file metadata if you can't derive it deterministically.
# BAD: collides across directories
def bad_key(path):
    return hashlib.sha256(path.name.encode()).hexdigest()[:32]

# GOOD: unique per resolved path + target
def good_key(path, target):
    payload = f"{path.resolve()}|{target}|{path.stat().st_size}"
    return hashlib.sha256(payload.encode()).hexdigest()[:64]

Detecting idempotency cache hits

The API returns an X-Idempotency-Replayed: true header on cached responses. Log this in your pipeline to track how many conversions are genuinely new vs. replayed:

resp = await client.post(...)
if resp.headers.get('X-Idempotency-Replayed') == 'true':
    metrics['replayed'] += 1
else:
    metrics['new_conversion'] += 1

A high replayed rate (>20%) suggests your checkpoint logic has a bug — you're re-submitting files you've already processed. Investigate why the checkpoint isn't being written or read correctly.

Idempotency is one of those properties that seems like an optimization until you have a pipeline crash at 3am and realize you just billed 5,000 duplicate conversions. The Idempotency-Key header costs nothing to add and gives you free retries within 24 hours. Combine it with a checkpoint file and your pipeline is crash-safe at any scale. Free API key to get started.