Why xargs -P instead of GNU parallel?

xargs -P is available on any system with coreutils. GNU parallel is more powerful but requires a separate install. For API-call parallelism (where the bottleneck is network, not CPU), xargs -P is sufficient and more portable.

How do I convert multiple source formats in one pass?

Run multiple find expressions: find /data $ -name '*.png' -o -name '*.jpg' $ -type f -print0 | xargs -0 -P 8 ... This handles mixed-extension directories without running the script twice.

What happens to the original file after conversion?

Nothing — the script writes the converted output alongside the original and leaves the source file untouched. To delete originals after successful conversion, add rm "$input" after the mv in convert_file.

Can xargs -P overflow the API's rate limits?

Yes, with very high parallelism. If you get HTTP 429 responses, reduce -P or add a sleep in the retry handler. The convert_file function above handles 429 with a 60s sleep and retry.

How do I handle directories with millions of files?

find handles arbitrarily large trees, but xargs -P will queue all pending processes in memory. For millions of files, consider chunking with find -maxdepth 1 and looping over subdirectories, processing one subtree at a time.

Bash Script to Recursively Convert Files with find + xargs + curl

Converting a flat directory is easy. Converting 10,000 files spread across a nested tree — different formats, different depths — is where the shell starts paying dividends. find + xargs is the Unix-native way to parallelise work over a file tree, and it composes cleanly with curl for API-driven conversion.

TL;DR

find /data/assets -name '*.png' -type f -print0 | \
  xargs -0 -P 8 -I{} bash -c '
    out="${1%.png}.webp"
    curl -sf \
      -H "Authorization: Bearer $CTF_API_KEY" \
      -F "file=@$1" \
      -F "target=webp" \
      -o "$out" \
      https://changethisfile.com/v1/convert || echo "FAILED: $1" >&2
  ' _ {}

This converts every .png under /data/assets to .webp, 8 files at a time. Replace target=webp with any of the 690 supported formats.

The use case

You've inherited a content repository with years of images in mixed formats — some PNG, some JPEG, some HEIC from phone uploads — scattered across a nested directory tree. You need to normalize everything to WebP for web delivery. Or you have a document store full of DOC files you need as PDF for archival.

Sequential shell loops are too slow for thousands of files. Local conversion tools (ImageMagick, ffmpeg) require installing and maintaining software. The right answer is parallelized API calls: let the conversion infrastructure handle the heavy lifting while you scale out the HTTP calls.

Key design decisions for this approach:

xargs -P for parallelism — simpler than GNU parallel, available everywhere
find -print0 / xargs -0 for correct handling of spaces and special characters in filenames
Atomic output writes — write to a temp file and rename, so interrupted conversions don't leave corrupted outputs
-f flag on curl — exits non-zero on HTTP errors, so xargs sees failures correctly

Complete bash script

#!/usr/bin/env bash
set -euo pipefail

# ---- config ----------------------------------------------------------------
API_KEY="${CTF_API_KEY:?CTF_API_KEY not set}"
SOURCE_EXT="${CTF_SOURCE_EXT:-png}"      # Extension to find (no leading dot)
TARGET_FORMAT="${CTF_TARGET_FORMAT:-webp}"
INPUT_DIR="${CTF_INPUT_DIR:-/data/assets}"
PARALLELISM="${CTF_PARALLELISM:-8}"
API_URL="https://changethisfile.com/v1/convert"
FAILED_LOG="/tmp/ctf-failed-$$.txt"
# ---------------------------------------------------------------------------

log() { echo "[$(date -u +%H:%M:%S)] $*"; }

export API_KEY API_URL TARGET_FORMAT FAILED_LOG

convert_file() {
  local input="$1"
  local dir output tmp
  dir=$(dirname "$input")
  stem=$(basename "$input")
  stem="${stem%.*}"
  output="$dir/${stem}.${TARGET_FORMAT}"
  tmp="$output.tmp.$$"

  local http_status
  http_status=$(curl -sf \
    --max-time 120 \
    -w "%{http_code}" \
    -H "Authorization: Bearer $API_KEY" \
    -F "file=@$input" \
    -F "target=$TARGET_FORMAT" \
    -o "$tmp" \
    "$API_URL" 2>/dev/null
  ) || { echo "$input" >> "$FAILED_LOG"; rm -f "$tmp"; return 1; }

  if [[ "$http_status" == "200" ]] && [[ -s "$tmp" ]]; then
    mv "$tmp" "$output"
    echo "OK $input"
  else
    echo "FAIL ($http_status) $input"
    echo "$input" >> "$FAILED_LOG"
    rm -f "$tmp"
    return 1
  fi
}

export -f convert_file

log "Scanning $INPUT_DIR for .$SOURCE_EXT files..."
total=$(find "$INPUT_DIR" -name "*.$SOURCE_EXT" -type f | wc -l)
log "Found $total files. Starting conversion at parallelism=$PARALLELISM"

# The main pipeline
find "$INPUT_DIR" -name "*.$SOURCE_EXT" -type f -print0 | \
  xargs -0 -P "$PARALLELISM" -I{} bash -c 'convert_file "$@"' _ {}

failed=0
if [[ -f "$FAILED_LOG" ]]; then
  failed=$(wc -l < "$FAILED_LOG")
fi

converted=$((total - failed))
log "DONE: $converted/$total converted, $failed failed"

if [[ $failed -gt 0 ]]; then
  log "Failed files saved to $FAILED_LOG"
  exit 1
fi

Error handling and retries

xargs exits 1 if any child process exits non-zero — but it doesn't stop other children. All parallel conversions run to completion; failures are logged to a file for inspection or retry.

To retry only the failed files:

# After the main run, retry failures once
if [[ -f "$FAILED_LOG" ]]; then
  log "Retrying ${failed} failed files..."
  cat "$FAILED_LOG" | tr '\n' '\0' | \
    xargs -0 -P 4 -I{} bash -c 'convert_file "$@"' _ {}
  rm "$FAILED_LOG"
fi

For rate-limit errors (HTTP 429), add a sleep in the convert_file function:

  if [[ "$http_status" == "429" ]]; then
    local retry_after=60
    log "Rate limited. Sleeping ${retry_after}s before retry of $input"
    sleep "$retry_after"
    # Recursive retry
    convert_file "$1"
    return
  fi

Rate limits by plan: free (1K/mo), $29 (10K/mo), $99 (50K/mo), $499 (250K/mo), $1999 (1M/mo). For large batch jobs, match -P to your plan's sustained throughput.

Running as a one-shot or from cron

For a one-shot migration run:

CTF_API_KEY=ctf_sk_your_key \
CTF_SOURCE_EXT=png \
CTF_TARGET_FORMAT=webp \
CTF_INPUT_DIR=/var/www/assets \
CTF_PARALLELISM=8 \
  bash /opt/scripts/convert-recursive.sh | tee /var/log/ctf-migration.log

For a recurring cron job that processes only new files, combine with find's -newer flag:

# Convert PNGs modified in the last 24 hours
find "$INPUT_DIR" \
  -name "*.$SOURCE_EXT" \
  -type f \
  -newer /var/lib/ctf/last-run.stamp \
  -print0 | \
  xargs -0 -P "$PARALLELISM" -I{} bash -c 'convert_file "$@"' _ {}

# Update the stamp after a successful run
touch /var/lib/ctf/last-run.stamp

This is simpler than maintaining a done log and works well for timestamp-based pipelines where upstream processes are the authority on what's new.

Production tips

Tune -P to your plan. Free tier has no explicit per-second rate limit, but 8 parallel requests is a reasonable default. On paid plans with higher throughput, -P 16 or -P 32 is safe.
Skip already-converted files. Add a -not -name "*.${TARGET_FORMAT}" to the find command, or check for output file existence in convert_file before making the API call.
Use -print0 / xargs -0 always. Files with spaces in their names break without null-delimiter mode. This is the most common reason recursive scripts work in testing but fail on real user data.
Watch temp file accumulation. If the script is killed mid-run, *.tmp.* files litter the output directory. Add a cleanup trap: trap 'find $INPUT_DIR -name "*.tmp.*" -delete' EXIT.
Progress monitoring. Pipe output through grep --line-buffered '^OK' | pv -l -s $total > /dev/null to get a real-time progress bar against the total file count.

find + xargs + curl is the most portable parallel file conversion pipeline in bash — no dependencies beyond coreutils and curl. For a directory of 10,000 images at -P 8, you're converting 8 files per round-trip, with each conversion taking 1-3 seconds. That's the full tree done in under an hour on a standard connection. Get a free API key — 1,000 conversions/month, no card.

Bash Script: Recursive File Conversion with find, xargs, and curl