PNG is the right choice over JPG when you need transparency, lossless quality (no JPEG artifacts), or pixel-perfect output for OCR. The libraries are the same as PDF-to-JPG; you just change the format flag. Three viable paths: pdf2image (Poppler), PyMuPDF (no native deps), or the API.
Method 1: pdf2image (Poppler-backed)
pdf2image wraps Poppler's pdftoppm. Same engine PDF readers use.
pip install pdf2image
apt install poppler-utils
# macOS: brew install poppler
from pdf2image import convert_from_path
from pathlib import Path
def pdf_to_png(in_path: str, out_dir: str, dpi: int = 200) -> list[str]:
out_dir = Path(out_dir)
out_dir.mkdir(parents=True, exist_ok=True)
images = convert_from_path(in_path, dpi=dpi, fmt="png", thread_count=4, transparent=True)
paths = []
for i, img in enumerate(images):
out_path = out_dir / f"page-{i+1:03d}.png"
img.save(out_path, "PNG", optimize=True)
paths.append(str(out_path))
return paths
pages = pdf_to_png("document.pdf", "./pages", dpi=200)
print(f"wrote {len(pages)} pages")
Three knobs that matter:
- dpi — 150 for web, 200-300 for OCR/print. PNG handles high DPI gracefully (no compression artifacts).
- transparent=True — preserves transparency from PDFs that have cutout backgrounds. Without it, transparent areas render white.
- optimize=True — smaller files, slower save. Worth it for served images.
Method 2: PyMuPDF (fitz, no Poppler)
PyMuPDF wraps MuPDF — a fast, lightweight PDF engine. Single pip install, no system deps.
pip install PyMuPDF
import fitz # PyMuPDF
from pathlib import Path
def pdf_to_png(in_path: str, out_dir: str, dpi: int = 200, alpha: bool = True) -> list[str]:
out_dir = Path(out_dir)
out_dir.mkdir(parents=True, exist_ok=True)
doc = fitz.open(in_path)
paths = []
zoom = dpi / 72
matrix = fitz.Matrix(zoom, zoom)
for i, page in enumerate(doc):
pix = page.get_pixmap(matrix=matrix, alpha=alpha)
out_path = out_dir / f"page-{i+1:03d}.png"
pix.save(out_path)
paths.append(str(out_path))
doc.close()
return paths
pages = pdf_to_png("document.pdf", "./pages", dpi=200, alpha=True)
PyMuPDF is ~2x faster than pdf2image and ships as a single wheel. License is AGPL — for commercial SaaS distribution you may need a commercial license from Artifex.
Method 3: ChangeThisFile API (no installs)
If you don't want Poppler or PyMuPDF in your environment, the API runs Poppler server-side. Free tier covers 1,000 conversions/month.
import requests
import zipfile
from pathlib import Path
API_KEY = "ctf_sk_your_key_here"
def pdf_to_png(in_path: str, out_dir: str) -> list[str]:
with open(in_path, "rb") as f:
response = requests.post(
"https://changethisfile.com/v1/convert",
headers={"Authorization": f"Bearer {API_KEY}"},
files={"file": f},
data={"source": "pdf", "target": "png"},
timeout=120,
)
response.raise_for_status()
out_dir = Path(out_dir)
out_dir.mkdir(parents=True, exist_ok=True)
if response.headers.get("Content-Type", "").startswith("application/zip"):
zip_path = out_dir / "pages.zip"
zip_path.write_bytes(response.content)
with zipfile.ZipFile(zip_path) as zf:
zf.extractall(out_dir)
zip_path.unlink()
return sorted(str(p) for p in out_dir.glob("*.png"))
else:
out_path = out_dir / "page-001.png"
out_path.write_bytes(response.content)
return [str(out_path)]
pages = pdf_to_png("document.pdf", "./pages")
The API renders at 150 DPI by default. Pass dpi=300 for OCR-grade output.
When to use each
| Approach | Best for | Tradeoff |
|---|---|---|
| pdf2image | Self-hosted batch, max fidelity | Requires Poppler (~50MB system dep) |
| PyMuPDF | Single pip install, no native deps | AGPL license — check before commercial use |
| ChangeThisFile API | Edge runtimes, no infra ownership | Network call, file size limit (25MB free) |
When to use PNG over JPG
- Transparency — PDFs with cutout backgrounds (logos, diagrams) keep transparency in PNG; JPG flattens them.
- OCR input — JPEG compression artifacts can confuse Tesseract on small text. PNG is lossless.
- Diagrams and screenshots — sharp edges and solid colors compress much better in PNG. A diagram PDF page might be 50KB as PNG vs 200KB as JPG.
- UI mockups, wireframes — pixel-perfect rendering matters; PNG preserves it.
Use JPG for photo-heavy PDFs (e.g., scanned magazines) — JPG is 2-5x smaller for photographic content.
Production tips
- Use optimize=True for served PNGs. Pillow's optimize mode tries multiple compression strategies and picks the smallest. Adds a few hundred ms per page; saves 10-30% on file size.
- For OCR, render at 300 DPI. Tesseract's accuracy is highest at 300 DPI on small text. 200 works for body text; 150 is too low for small fonts.
- Strip alpha for opaque content. If your PDF doesn't actually use transparency, alpha=False saves space and avoids compositing surprises.
- Watch for password-protected PDFs. Pass userpw= to convert_from_path; doc.authenticate(pwd) for PyMuPDF; password= form field for the API.
- Parallelize multi-page rendering. thread_count=4 in pdf2image; ThreadPoolExecutor wrapping page.get_pixmap calls in PyMuPDF.
For most projects PyMuPDF is the easiest deploy. For Poppler shops, pdf2image. For environments without either, the API. Free tier covers 1,000 conversions/month.