Convert Files After rsync: Post-Transfer Hook to ChangeThisFile API
Published Apr 25, 20267 min read
By ChangeThisFile Team
Quick Answer
rsync has no native post-transfer hook, but a wrapper script approach works cleanly: run rsync with --itemize-changes, parse new/updated files from the output, POST each to the ChangeThisFile API via curl, then rsync the converted results back to the destination. Wrap the whole chain in a bash script and call it instead of rsync directly.
rsync is the backbone of a lot of file distribution pipelines — backup jobs, asset sync, deploy scripts, media distribution networks. What rsync doesn't have is a post-transfer hook that fires per file. There's no --on-transfer flag. The standard workaround is a wrapper script: run rsync, capture its output, parse which files were transferred, and then act on them.
Combined with the ChangeThisFile API, this pattern gives you an rsync pipeline where files are automatically converted on arrival. Useful for: converting uploaded DOCX files to PDF after sync, WebP-ifying PNG assets after a deploy sync, or normalizing audio formats after a media ingest transfer.
TL;DR
Wrap your rsync call to capture --itemize-changes output. Parse transferred files (lines starting with >f). POST each to the API. rsync converted results back to destination. The wrapper script is a drop-in replacement for your existing rsync command.
# Replace your existing rsync call with:
bash /opt/scripts/rsync-convert.sh \
user@source:/data/uploads/ \
/data/received/ \
pdf
The use case
Three common pipelines that benefit from this pattern:
Document normalization on ingest. A file upload portal lets users submit DOCX, ODT, RTF, and plain-text files. A nightly rsync pulls them from the upload server to the processing server. After transfer, every document is converted to PDF so downstream tooling (search indexers, archival systems) only has to handle one format.
Web asset optimization on deploy. A static site deploy pipeline rsyncs PNG assets from a build server to a CDN origin. Post-transfer, PNGs are converted to WebP and rsynced to a parallel /webp/ path. The site serves WebP to modern browsers, PNG as fallback.
Media format normalization for distribution. A podcast production workflow: raw WAV recordings land on an ingest server. rsync transfers them to the distribution server. Post-transfer hook converts WAV to MP3 and M4A, then rsync pushes converted files to the CDN.
In all three cases, the conversion step is a natural extension of the transfer — not a separate cron job that runs later and has to figure out what's new.
Working rsync wrapper script
Save as /opt/scripts/rsync-convert.sh and chmod +x:
#!/usr/bin/env bash
# rsync-convert.sh
# Usage: rsync-convert.sh
#
# 1. rsyncs source → destination
# 2. Parses itemize-changes output for new/updated files
# 3. Converts each file via ChangeThisFile API
# 4. rsyncs converted files back to destination (in a /converted/ subdir)
set -euo pipefail
# ---- arguments & config -----------------------------------------------------
SOURCE="${1:?Usage: rsync-convert.sh }"
DESTINATION="${2:?Usage: rsync-convert.sh }"
TARGET_FORMAT="${3:-pdf}"
API_KEY="${CTF_API_KEY:?CTF_API_KEY not set}"
API_URL="https://changethisfile.com/v1/convert"
LOG_FILE="${CTF_LOG:-/var/log/rsync-convert.log}"
CONVERTED_SUBDIR="${CONVERTED_SUBDIR:-converted}"
# Optional: only convert files matching this extension (leave empty for all)
SOURCE_EXT_FILTER="${SOURCE_EXT_FILTER:-}"
# ------------------------------------------------------------------------------
log() { echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] $*" | tee -a "$LOG_FILE"; }
# --- Step 1: rsync and capture itemize-changes output -----------------------
log "rsync: $SOURCE -> $DESTINATION"
rsync_output=$(rsync \
--archive \
--itemize-changes \
--no-progress \
"$SOURCE" \
"$DESTINATION" 2>&1)
rsync_exit=$?
if [[ $rsync_exit -ne 0 ]]; then
log "ERROR: rsync failed (exit $rsync_exit)"
log "rsync output: $rsync_output"
exit $rsync_exit
fi
# Count transferred files for logging
transferred_count=$(echo "$rsync_output" | grep -c '^>f' || true)
log "rsync complete: $transferred_count files transferred"
if [[ $transferred_count -eq 0 ]]; then
log "No new files to convert"
exit 0
fi
# --- Step 2: Parse transferred file paths ------------------------------------
# itemize-changes format: >f+++++++++ path/to/file.ext (new file)
# >f......... path/to/file.ext (updated file)
# We want both. Strip trailing destination path to get relative filenames.
dest_dir="${DESTINATION%/}" # Ensure no trailing slash
converted_dir="$dest_dir/$CONVERTED_SUBDIR"
mkdir -p "$converted_dir"
converted=0
failed=0
while IFS= read -r line; do
# Match transferred regular files
[[ "$line" =~ ^\>f ]] || continue
# Extract relative path (field 2 onward)
rel_path=$(echo "$line" | awk '{print $2}')
src_file="$dest_dir/$rel_path"
filename=$(basename "$rel_path")
stem="${filename%.*}"
ext="${filename##*.}"
# Skip if extension filter is set and doesn't match
if [[ -n "$SOURCE_EXT_FILTER" ]] && [[ "$ext" != "$SOURCE_EXT_FILTER" ]]; then
continue
fi
[[ -f "$src_file" ]] || { log "WARN: $src_file not found (may be in subdir)"; continue; }
out_file="$converted_dir/${stem}.${TARGET_FORMAT}"
log "CONVERT $rel_path -> $CONVERTED_SUBDIR/${stem}.${TARGET_FORMAT}"
# --- Step 3: Convert via API -----------------------------------------------
http_status=$(curl -sf \
--max-time 180 \
--retry 2 \
--retry-delay 10 \
-w "%{http_code}" \
-o "$out_file" \
-H "Authorization: Bearer $API_KEY" \
-F "file=@$src_file" \
-F "target=$TARGET_FORMAT" \
"$API_URL") || {
log "ERROR: curl failed for $rel_path"
((failed++)) || true
continue
}
if [[ "$http_status" == "200" ]]; then
out_size=$(stat -c%s "$out_file" 2>/dev/null || echo "?")
log "OK $rel_path (${out_size} bytes out)"
((converted++)) || true
else
rm -f "$out_file"
log "ERROR HTTP $http_status for $rel_path"
((failed++)) || true
fi
done < <(echo "$rsync_output")
log "Conversion pass: converted=$converted failed=$failed"
# --- Step 4: rsync converted files back to destination ----------------------
if [[ $converted -gt 0 ]]; then
log "rsyncing $converted converted files from $converted_dir/"
rsync \
--archive \
--no-progress \
"$converted_dir/" \
"$dest_dir/$CONVERTED_SUBDIR/"
log "rsync of converted files done"
fi
# Exit non-zero if any conversions failed
[[ $failed -eq 0 ]] || exit 1
Common invocation patterns:
# Convert all transferred files to PDF
CTF_API_KEY=ctf_sk_your_key_here \
bash /opt/scripts/rsync-convert.sh \
user@upload-server:/var/uploads/ \
/var/received/ \
pdf
# Convert only .png files to webp
CTF_API_KEY=ctf_sk_your_key_here \
SOURCE_EXT_FILTER=png \
bash /opt/scripts/rsync-convert.sh \
build-server:/var/assets/ \
/var/cdn-origin/ \
webp
# Override converted subdir name
CTF_API_KEY=ctf_sk_your_key_here \
CONVERTED_SUBDIR=processed \
bash /opt/scripts/rsync-convert.sh \
/mnt/nas/uploads/ \
/var/archive/ \
mp3
Error handling and idempotency
rsync itself handles retries via --checksum and --partial. The conversion step needs its own retry logic because the API call can fail independently of the transfer.
The script as written uses curl's built-in --retry 2. For production pipelines where you want explicit retry control:
convert_with_retry() {
local file="$1" out="$2" target="$3"
local attempt max=3 delay=15
for attempt in $(seq 1 $max); do
local status
status=$(curl -sf \
--max-time 180 \
-w "%{http_code}" \
-o "$out" \
-H "Authorization: Bearer $API_KEY" \
-F "file=@$file" \
-F "target=$target" \
"$API_URL") && [[ "$status" == "200" ]] && return 0
log "Retry $attempt/$max for $(basename "$file") (status: ${status:-curl-err})"
rm -f "$out"
sleep $((delay * attempt))
done
return 1
}
Idempotency. The script re-converts on every run for the same files because itemize-changes only reports files rsync actually transferred this run. Files already in destination that rsync skips (because checksums match) won't appear in the output and won't be re-converted. This is the right behavior — the done log is built into rsync itself.
If you want to re-convert even unchanged source files (e.g., after changing the target format), pass --checksum to rsync in your own invocation to force re-transfer, or delete the converted output files manually to trigger re-conversion on the next run.
Partial output cleanup. If curl writes a partial file before failing, rm -f "$out_file" in the error path ensures no corrupt output persists. This is critical — a zero-byte PDF or truncated MP3 is worse than no file.
Scheduling and integration patterns
The wrapper script is a drop-in replacement for any existing rsync call. Add it to cron, a systemd timer, or a CI step:
cron integration
# /etc/cron.d/rsync-convert
SHELL=/bin/bash
CTF_API_KEY=ctf_sk_your_key_here
CONVERTED_SUBDIR=pdf
SOURCE_EXT_FILTER=docx
# Every 30 minutes, sync and convert
*/30 * * * * root /opt/scripts/rsync-convert.sh \
uploads@ingest:/var/uploads/ \
/var/processed/ \
pdf >> /var/log/rsync-convert.log 2>&1
Post-receive git hook (asset pipeline variant) If your source is a git server and rsync is part of a deploy, add the conversion step to hooks/post-receive:
Parse itemize-changes carefully. The format is YXcstpoga path where Y is the update type (> = sent to remote, . = unchanged, c = local change). Filter on ^>f for sent files. Don't filter on ^< (received from remote) unless your rsync direction is reversed.
Use --checksum on the ingest rsync for high-integrity pipelines. By default rsync uses mtime+size to detect changes. In a pipeline where files might be replaced with different content but the same mtime (common with generated files), --checksum forces a full content comparison. Slower, but safer.
rsync the converted files to a separate subdirectory, not back to the source dir. The script sends converted files to $dest/$CONVERTED_SUBDIR/. This keeps originals and conversions cleanly separated and prevents rsync from picking up converted files as new input on the next run (which would create infinite loops if source and dest are the same host).
Free tier covers 1,000 conversions/month. A pipeline rsyncing 30-40 files/day (document ingest, asset deploy) stays within the free tier. High-volume ingest pipelines should estimate monthly volume before selecting a plan — $29/mo for 10K, $99/mo for 100K.
Log the itemize-changes output, not just your conversion log. rsync itemize-changes output is the audit trail for what was transferred. Append it to your log file for full traceability: what transferred, what was converted, what failed.
rsync doesn't need a native post-transfer hook when a wrapper script can parse its own output. The three-step pattern — rsync, parse itemize-changes, convert via API — is clean, auditable, and slots into any existing rsync-based pipeline as a drop-in replacement. Get a free API key to wire your next rsync pipeline to automatic conversion.
Key Takeaways
rsync --itemize-changes output is the hook — parse lines starting with >f for transferred files
The wrapper script is a drop-in for any existing rsync call with no changes to surrounding infrastructure
Converted files go to a separate subdirectory to avoid infinite-loop re-ingestion
rm -f on error path prevents corrupt partial output from persisting
Use --checksum on ingest rsync when files may have identical mtime+size but different content
Frequently Asked Questions
Does rsync --itemize-changes always output transferred files, even when nothing changed?
No. itemize-changes only outputs entries for files that are actually transferred or changed. Unchanged files are silent. This means the conversion step only runs for genuinely new or updated files — it's inherently efficient.
What if rsync transfers a file that fails conversion — will rsync re-transfer it next time?
No. rsync considers its job done once the transfer succeeds. A failed conversion doesn't tell rsync to retransfer. To force re-conversion, delete the partial output file from the destination's converted/ subdirectory. On the next rsync run, if the source file hasn't changed, rsync won't retransfer it — but you can force conversion of already-received files by running the conversion step independently.
Can I use this with rsync over SSH?
Yes. The script passes SOURCE and DESTINATION directly to rsync, so SSH sources like user@host:/path/ work exactly as they do with rsync directly. The conversion step runs locally on the receiving machine.
How do I handle directories with subdirectories?
The itemize-changes output includes the relative path from the destination root. The script builds the full path as $dest_dir/$rel_path. This works for nested directories. The converted output is placed in $converted_dir/ with the same basename (directory structure flattened). To preserve subdirectory structure in output, use dirname to recreate it: mkdir -p $converted_dir/$(dirname $rel_path).
What's the performance impact of the API calls?
Each conversion is a sequential HTTP POST. For a 50-file batch, expect 30-90 seconds of conversion time depending on file types and sizes. For large batches, add parallelism with xargs -P: echo $transferred_files | xargs -P 4 -I{} bash convert_single_file.sh {}.