PNG is the right choice over JPG when you need transparency, lossless quality (no JPEG artifacts), or pixel-perfect output for OCR. The libraries are the same as PDF-to-JPG; you just change the format flag. Three viable paths: pdf2image (Poppler), PyMuPDF (no native deps), or the API.

Method 1: pdf2image (Poppler-backed)

pdf2image wraps Poppler's pdftoppm. Same engine PDF readers use.

pip install pdf2image
apt install poppler-utils
# macOS: brew install poppler
from pdf2image import convert_from_path
from pathlib import Path

def pdf_to_png(in_path: str, out_dir: str, dpi: int = 200) -> list[str]:
    out_dir = Path(out_dir)
    out_dir.mkdir(parents=True, exist_ok=True)

    images = convert_from_path(in_path, dpi=dpi, fmt="png", thread_count=4, transparent=True)
    paths = []
    for i, img in enumerate(images):
        out_path = out_dir / f"page-{i+1:03d}.png"
        img.save(out_path, "PNG", optimize=True)
        paths.append(str(out_path))
    return paths

pages = pdf_to_png("document.pdf", "./pages", dpi=200)
print(f"wrote {len(pages)} pages")

Three knobs that matter:

  • dpi — 150 for web, 200-300 for OCR/print. PNG handles high DPI gracefully (no compression artifacts).
  • transparent=True — preserves transparency from PDFs that have cutout backgrounds. Without it, transparent areas render white.
  • optimize=True — smaller files, slower save. Worth it for served images.

Method 2: PyMuPDF (fitz, no Poppler)

PyMuPDF wraps MuPDF — a fast, lightweight PDF engine. Single pip install, no system deps.

pip install PyMuPDF
import fitz  # PyMuPDF
from pathlib import Path

def pdf_to_png(in_path: str, out_dir: str, dpi: int = 200, alpha: bool = True) -> list[str]:
    out_dir = Path(out_dir)
    out_dir.mkdir(parents=True, exist_ok=True)

    doc = fitz.open(in_path)
    paths = []
    zoom = dpi / 72
    matrix = fitz.Matrix(zoom, zoom)
    for i, page in enumerate(doc):
        pix = page.get_pixmap(matrix=matrix, alpha=alpha)
        out_path = out_dir / f"page-{i+1:03d}.png"
        pix.save(out_path)
        paths.append(str(out_path))
    doc.close()
    return paths

pages = pdf_to_png("document.pdf", "./pages", dpi=200, alpha=True)

PyMuPDF is ~2x faster than pdf2image and ships as a single wheel. License is AGPL — for commercial SaaS distribution you may need a commercial license from Artifex.

Method 3: ChangeThisFile API (no installs)

If you don't want Poppler or PyMuPDF in your environment, the API runs Poppler server-side. Free tier covers 1,000 conversions/month.

import requests
import zipfile
from pathlib import Path

API_KEY = "ctf_sk_your_key_here"

def pdf_to_png(in_path: str, out_dir: str) -> list[str]:
    with open(in_path, "rb") as f:
        response = requests.post(
            "https://changethisfile.com/v1/convert",
            headers={"Authorization": f"Bearer {API_KEY}"},
            files={"file": f},
            data={"source": "pdf", "target": "png"},
            timeout=120,
        )
    response.raise_for_status()

    out_dir = Path(out_dir)
    out_dir.mkdir(parents=True, exist_ok=True)

    if response.headers.get("Content-Type", "").startswith("application/zip"):
        zip_path = out_dir / "pages.zip"
        zip_path.write_bytes(response.content)
        with zipfile.ZipFile(zip_path) as zf:
            zf.extractall(out_dir)
        zip_path.unlink()
        return sorted(str(p) for p in out_dir.glob("*.png"))
    else:
        out_path = out_dir / "page-001.png"
        out_path.write_bytes(response.content)
        return [str(out_path)]

pages = pdf_to_png("document.pdf", "./pages")

The API renders at 150 DPI by default. Pass dpi=300 for OCR-grade output.

When to use each

ApproachBest forTradeoff
pdf2imageSelf-hosted batch, max fidelityRequires Poppler (~50MB system dep)
PyMuPDFSingle pip install, no native depsAGPL license — check before commercial use
ChangeThisFile APIEdge runtimes, no infra ownershipNetwork call, file size limit (25MB free)

When to use PNG over JPG

  • Transparency — PDFs with cutout backgrounds (logos, diagrams) keep transparency in PNG; JPG flattens them.
  • OCR input — JPEG compression artifacts can confuse Tesseract on small text. PNG is lossless.
  • Diagrams and screenshots — sharp edges and solid colors compress much better in PNG. A diagram PDF page might be 50KB as PNG vs 200KB as JPG.
  • UI mockups, wireframes — pixel-perfect rendering matters; PNG preserves it.

Use JPG for photo-heavy PDFs (e.g., scanned magazines) — JPG is 2-5x smaller for photographic content.

Production tips

  • Use optimize=True for served PNGs. Pillow's optimize mode tries multiple compression strategies and picks the smallest. Adds a few hundred ms per page; saves 10-30% on file size.
  • For OCR, render at 300 DPI. Tesseract's accuracy is highest at 300 DPI on small text. 200 works for body text; 150 is too low for small fonts.
  • Strip alpha for opaque content. If your PDF doesn't actually use transparency, alpha=False saves space and avoids compositing surprises.
  • Watch for password-protected PDFs. Pass userpw= to convert_from_path; doc.authenticate(pwd) for PyMuPDF; password= form field for the API.
  • Parallelize multi-page rendering. thread_count=4 in pdf2image; ThreadPoolExecutor wrapping page.get_pixmap calls in PyMuPDF.

For most projects PyMuPDF is the easiest deploy. For Poppler shops, pdf2image. For environments without either, the API. Free tier covers 1,000 conversions/month.