S3 event notifications are one of the cleanest serverless triggers available: drop a file, something runs. The standard pattern for format conversion uses S3 → SNS/SQS → Lambda with an FFmpeg Lambda layer or a container image. That's 2-4GB of Lambda deployment package, cold start times measured in seconds, and a Lambda layer that needs updating whenever FFmpeg releases a CVE fix.
An alternative: a 20KB Lambda function that calls the ChangeThisFile API. Same event-driven trigger, same S3-to-S3 result flow, but the Lambda deployment is tiny and there are no native media tools to maintain. The tradeoff is an outbound HTTPS call per conversion — Lambda pays egress, you pay for API conversions.
TL;DR
S3 PUT → Lambda (Python) → ChangeThisFile API → S3 PUT (converted result). The Lambda function is ~60 lines of Python. Deployment package is ~20KB. No Lambda layers needed.
# Deploy with AWS CLI
zip function.zip lambda_function.py
aws lambda update-function-code \
--function-name ctf-converter \
--zip-file fileb://function.zip
The use case
S3-based conversion pipelines appear in:
- Document ingest pipelines. Users upload DOCX/ODS/RTF to
s3://company-uploads/raw/. Lambda converts to PDF, writes tos3://company-uploads/processed/. Downstream indexers only consume PDFs. - Image optimization for CDN delivery. Build pipeline pushes PNG assets to
s3://assets-input/. Lambda converts to WebP, writes tos3://assets-cdn/webp/. CloudFront serves from the output bucket. - Media format normalization. Audio recordings land as WAV or AIFF in an S3 bucket. Lambda normalizes to MP3 for podcast RSS feeds and streaming.
- Automated ebook format conversion. Authors upload EPUB manuscripts to S3. Lambda converts to MOBI, AZW3, and PDF for multi-format distribution.
The ChangeThisFile API covers all 690 routes, so a single Lambda function handles the entire range of formats your pipeline might encounter. You just change the target parameter.
Lambda function, IAM policy, and S3 event configuration
1. Lambda function — save as lambda_function.py:
"""lambda_function.py
S3 PUT → ChangeThisFile API → S3 PUT (converted result)
Environment variables:
CTF_API_KEY - ChangeThisFile API key (required)
TARGET_FORMAT - Output format, e.g. pdf, webp, mp3 (default: pdf)
OUTPUT_BUCKET - Destination S3 bucket (default: same as input bucket)
OUTPUT_PREFIX - Key prefix for converted files (default: converted/)
SOURCE_EXT_FILTER - Comma-separated list of extensions to process (default: all)
e.g. "docx,odt,rtf" to only convert document files
"""
import json
import logging
import os
import urllib.request
import urllib.error
from io import BytesIO
import boto3
logger = logging.getLogger()
logger.setLevel(logging.INFO)
APIURl = "https://changethisfile.com/v1/convert"
def get_env(key: str, required: bool = True, default: str = "") -> str:
val = os.environ.get(key, default)
if required and not val:
raise ValueError(f"Environment variable {key} is not set")
return val
def build_multipart(file_bytes: bytes, filename: str, target: str):
"""Build a multipart/form-data body without external libraries."""
boundary = b"----CTFBoundary7MA4YWxkTrZu0gW"
def part(name: str, value: bytes, fname: str = "") -> bytes:
header = (
f'--{boundary.decode()}\r\n'
f'Content-Disposition: form-data; name="{name}"'
)
if fname:
header += f'; filename="{fname}"'
header += "\r\nContent-Type: application/octet-stream\r\n\r\n"
return header.encode() + value + b"\r\n"
body = (
part("file", file_bytes, fname=filename) +
part("target", target.encode()) +
f"--{boundary.decode()}--\r\n".encode()
)
content_type = f"multipart/form-data; boundary={boundary.decode()}"
return body, content_type
def convert_file(file_bytes: bytes, filename: str, target: str, api_key: str) -> bytes:
"""POST file to ChangeThisFile API, return converted bytes."""
body, content_type = build_multipart(file_bytes, filename, target)
req = urllib.request.Request(
APIURl,
data=body,
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": content_type,
},
method="POST",
)
with urllib.request.urlopen(req, timeout=180) as response:
if response.status != 200:
raise RuntimeError(f"API returned HTTP {response.status}")
return response.read()
def lambda_handler(event, context):
api_key = get_env("CTF_API_KEY")
target_format = get_env("TARGET_FORMAT", required=False, default="pdf")
output_prefix = get_env("OUTPUT_PREFIX", required=False, default="converted/")
source_ext_filter_raw = get_env("SOURCE_EXT_FILTER", required=False)
source_ext_filter = (
{e.strip().lower() for e in source_ext_filter_raw.split(",")}
if source_ext_filter_raw else set()
)
s3 = boto3.client("s3")
results = []
for record in event.get("Records", []):
bucket = record["s3"]["bucket"]["name"]
key = record["s3"]["object"]["key"]
filename = key.split("/")[-1]
stem, ext = (filename.rsplit(".", 1) + [""])[:2]
ext = ext.lower()
# Apply extension filter
if source_ext_filter and ext not in source_ext_filter:
logger.info("Skipping %s (extension %s not in filter)", key, ext)
continue
output_bucket = get_env("OUTPUT_BUCKET", required=False, default=bucket)
output_key = f"{output_prefix.rstrip('/')}/{stem}.{target_format}"
logger.info("Processing s3://%s/%s -> s3://%s/%s",
bucket, key, output_bucket, output_key)
try:
# Download source file from S3
response = s3.get_object(Bucket=bucket, Key=key)
file_bytes = response["Body"].read()
logger.info("Downloaded %d bytes from s3://%s/%s", len(file_bytes), bucket, key)
# Convert via ChangeThisFile API
converted_bytes = convert_file(file_bytes, filename, target_format, api_key)
logger.info("Converted %s: %d bytes -> %d bytes",
filename, len(file_bytes), len(converted_bytes))
# Upload converted file to S3
s3.put_object(
Bucket=output_bucket,
Key=output_key,
Body=converted_bytes,
# Optional: tag for traceability
Tagging=f"source-bucket={bucket}&source-key={key}&converted-from={ext}&converted-to={target_format}",
)
logger.info("Uploaded converted file to s3://%s/%s", output_bucket, output_key)
results.append({"status": "ok", "source": key, "output": output_key})
except urllib.error.HTTPError as e:
logger.error("API HTTP error %d for %s: %s", e.code, key, e.read())
results.append({"status": "error", "source": key, "error": f"HTTP {e.code}"})
raise # Re-raise to trigger Lambda retry from SQS/SNS
except Exception as e:
logger.error("Error processing %s: %s", key, str(e))
results.append({"status": "error", "source": key, "error": str(e)})
raise
return {"statusCode": 200, "body": json.dumps(results)}
2. IAM policy for the Lambda execution role
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:GetObject"],
"Resource": "arn:aws:s3:::your-input-bucket/*"
},
{
"Effect": "Allow",
"Action": ["s3:PutObject"],
"Resource": "arn:aws:s3:::your-output-bucket/converted/*"
},
{
"Effect": "Allow",
"Action": ["logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents"],
"Resource": "arn:aws:logs:*:*:*"
}
]
}
3. Deploy and configure S3 event notification via AWS CLI
# Create the Lambda function
zip function.zip lambda_function.py
aws lambda create-function \
--function-name ctf-converter \
--runtime python3.12 \
--handler lambda_function.lambda_handler \
--role arn:aws:iam::YOUR_ACCOUNT_ID:role/ctf-converter-role \
--zip-file fileb://function.zip \
--timeout 300 \
--memory-size 512 \
--environment Variables="{CTF_API_KEY=ctf_sk_your_key_here,TARGET_FORMAT=pdf,OUTPUT_BUCKET=your-output-bucket,OUTPUT_PREFIX=converted}"
# Grant S3 permission to invoke Lambda
aws lambda add-permission \
--function-name ctf-converter \
--statement-id s3-trigger \
--action lambda:InvokeFunction \
--principal s3.amazonaws.com \
--source-arn arn:aws:s3:::your-input-bucket \
--source-account YOUR_ACCOUNT_ID
# Configure S3 event notification
aws s3api put-bucket-notification-configuration \
--bucket your-input-bucket \
--notification-configuration '{
"LambdaFunctionConfigurations": [{
"LambdaFunctionArn": "arn:aws:lambda:us-east-1:YOUR_ACCOUNT_ID:function:ctf-converter",
"Events": ["s3:ObjectCreated:Put"],
"Filter": {
"Key": {
"FilterRules": [{"Name": "prefix", "Value": "uploads/"}]
}
}
}]
}'
Optional: Store API key in Parameter Store (more secure than Lambda env vars):
# Store the key
aws ssm put-parameter \
--name /ctf/api-key \
--value ctf_sk_your_key_here \
--type SecureString
# In lambda_function.py, replace the get_env call with:
import boto3
ssm = boto3.client('ssm')
def get_api_key() -> str:
response = ssm.get_parameter(Name='/ctf/api-key', WithDecryption=True)
return response['Parameter']['Value']
Error handling and Lambda retry behavior
Lambda retries matter here. If the function raises an exception:
- Direct S3 trigger: Lambda retries the event 2 more times automatically. The third failure drops the event (configure a Dead Letter Queue to capture it).
- Via SQS: Lambda returns the message to the queue. The queue's
VisibilityTimeoutandMaxReceiveCountcontrol retry behavior before the message goes to the DLQ. - Via SNS: SNS retries delivery to Lambda 3 times. After that, SNS moves it to the subscription DLQ if configured.
The function re-raises exceptions after logging them, which is intentional — it lets Lambda's native retry machinery handle the retry schedule rather than sleeping inside the function.
Idempotency. S3 PUT notifications can fire more than once for the same object (S3 event delivery is at-least-once). The Lambda will convert and re-upload the file each time it fires for the same key. This is safe for most pipelines — overwriting the converted output with an identical file is harmless. If you need exactly-once semantics, check if the output key already exists before converting:
try:
s3.head_object(Bucket=output_bucket, Key=output_key)
logger.info("Output already exists, skipping: %s", output_key)
return {"statusCode": 200, "body": "already converted"}
except s3.exceptions.ClientError as e:
if e.response['Error']['Code'] != '404':
raise # Real error, not "not found"
Large file handling. Lambda has 512MB of ephemeral /tmp storage by default (up to 10GB configurable). The function streams the file through memory rather than writing to /tmp. For very large files, increase Lambda's memory allocation — memory and CPU scale together in Lambda, so a 1GB memory allocation also gives more CPU for the download/upload.
Timeouts. Lambda timeout is set to 300s in the deployment command above. The API call has a 180s timeout. For large video files or dense documents, you may need to increase both. Lambda's maximum timeout is 15 minutes.
S3 event routing patterns
S3 event notifications support prefix and suffix filters, which lets a single bucket route different file types to different Lambda functions:
{
"LambdaFunctionConfigurations": [
{
"LambdaFunctionArn": "arn:aws:lambda:...:function:ctf-to-pdf",
"Events": ["s3:ObjectCreated:*"],
"Filter": {
"Key": {
"FilterRules": [
{"Name": "prefix", "Value": "docs/"},
{"Name": "suffix", "Value": ".docx"}
]
}
}
},
{
"LambdaFunctionArn": "arn:aws:lambda:...:function:ctf-to-webp",
"Events": ["s3:ObjectCreated:*"],
"Filter": {
"Key": {
"FilterRules": [
{"Name": "prefix", "Value": "images/"},
{"Name": "suffix", "Value": ".png"}
]
}
}
}
]
}
Avoid trigger loops. If your input and output bucket are the same, the converted file PUT will trigger another Lambda invocation. Prevent this by using an OUTPUT_PREFIX that doesn't match the S3 event filter prefix, or use separate buckets (cleaner).
SNS fanout for multiple consumers. If you need the same S3 event to trigger multiple consumers (conversion + indexing + notification), route through SNS:
# S3 -> SNS -> Lambda (conversion) + Lambda (indexer) + SQS (notification queue)
aws s3api put-bucket-notification-configuration \
--bucket your-input-bucket \
--notification-configuration '{
"TopicConfigurations": [{
"TopicArn": "arn:aws:sns:us-east-1:ACCOUNT:file-uploaded",
"Events": ["s3:ObjectCreated:Put"]
}]
}'
Production tips
- Use Parameter Store or Secrets Manager for the API key, not Lambda env vars. Lambda environment variables are visible to anyone with
lambda:GetFunctionConfigurationIAM permission. Parameter Store SecureString (KMS-encrypted) is the right place for secrets in Lambda. - Set concurrency limits on the Lambda function. S3 can fire many simultaneous PUTs (especially during bulk uploads). Without a concurrency limit, Lambda scales out and you hit the ChangeThisFile API with many parallel requests. Use
aws lambda put-function-concurrency --function-name ctf-converter --reserved-concurrent-executions 10to cap parallelism. - Configure a Dead Letter Queue. S3 events that fail all Lambda retries are silently dropped without a DLQ. Wire an SQS DLQ:
aws lambda update-function-configuration --function-name ctf-converter --dead-letter-config TargetArn=arn:aws:sqs:...:ctf-dlq. Monitor the DLQ for unprocessed files. - Estimate cost before connecting a high-volume bucket. Lambda charges ~$0.20 per million invocations + $0.0000166667/GB-second. At 512MB and 10s average duration: ~$0.085 per 1,000 invocations. Add ChangeThisFile API costs: free tier covers 1,000/month, $29/mo for 10K, $99/mo for 100K. A pipeline processing 5,000 files/month costs roughly $0.50 Lambda + $29 API = ~$30/month total.
- Tag converted objects with source metadata. The function already tags output objects with source-bucket, source-key, and format fields. This makes it easy to trace any converted file back to its source and audit the pipeline in S3 Inventory reports.
The S3 → Lambda → ChangeThisFile → S3 pipeline is roughly 60 lines of Python with zero Lambda layers, a ~20KB deployment package, and cold starts under 500ms. It handles 690 conversion routes with a single function controlled by an environment variable. Get a free API key — 1,000 conversions/month at no cost, enough to run this pipeline in low-volume environments without any spend.