Why is my PNG so much larger than the JPG?

PNG is lossless — gradients and photographs don't compress well. For photo-heavy PDFs, JPG is 2-5x smaller at indistinguishable visual quality. Use PNG for diagrams, screenshots, and OCR input where artifacts matter.

Can I get transparent PNG from any PDF?

Only if the PDF itself has transparency. Most PDFs have white backgrounds (opaque) — converting those to PNG with alpha just adds an unused alpha channel. Use alpha=True only when you know the source has transparency.

Does PNG handle DPI better than JPG?

PNG's lossless compression handles high DPI without artifacts. A 600 DPI page-as-PNG looks pixel-perfect; the same page as JPG at 600 DPI shows compression noise on text edges. Use PNG for high-DPI output.

How do I extract just one page?

pdf2image: pass first_page=N, last_page=N. PyMuPDF: index doc[N-1] directly. The API: pass page=N in form data.

What's the file size limit on the API?

Free tier: 25MB upload. PNG output for large PDFs can be much bigger than the source — for 100+ page documents, render in batches.

Why does my PNG have a black background instead of transparent?

Two causes: (1) you passed alpha=False (or didn't pass transparent=True). (2) The PDF's background is white (most PDFs). Setting alpha=True won't make a white background transparent — it just preserves transparency that's already there.

How to Convert PDF to PNG in Python (3 Methods + API)

PNG is the right choice over JPG when you need transparency, lossless quality (no JPEG artifacts), or pixel-perfect output for OCR. The libraries are the same as PDF-to-JPG; you just change the format flag. Three viable paths: pdf2image (Poppler), PyMuPDF (no native deps), or the API.

Method 1: pdf2image (Poppler-backed)

pdf2image wraps Poppler's pdftoppm. Same engine PDF readers use.

pip install pdf2image
apt install poppler-utils
# macOS: brew install poppler

from pdf2image import convert_from_path
from pathlib import Path

def pdf_to_png(in_path: str, out_dir: str, dpi: int = 200) -> list[str]:
    out_dir = Path(out_dir)
    out_dir.mkdir(parents=True, exist_ok=True)

    images = convert_from_path(in_path, dpi=dpi, fmt="png", thread_count=4, transparent=True)
    paths = []
    for i, img in enumerate(images):
        out_path = out_dir / f"page-{i+1:03d}.png"
        img.save(out_path, "PNG", optimize=True)
        paths.append(str(out_path))
    return paths

pages = pdf_to_png("document.pdf", "./pages", dpi=200)
print(f"wrote {len(pages)} pages")

Three knobs that matter:

dpi — 150 for web, 200-300 for OCR/print. PNG handles high DPI gracefully (no compression artifacts).
transparent=True — preserves transparency from PDFs that have cutout backgrounds. Without it, transparent areas render white.
optimize=True — smaller files, slower save. Worth it for served images.

Method 2: PyMuPDF (fitz, no Poppler)

PyMuPDF wraps MuPDF — a fast, lightweight PDF engine. Single pip install, no system deps.

pip install PyMuPDF

import fitz  # PyMuPDF
from pathlib import Path

def pdf_to_png(in_path: str, out_dir: str, dpi: int = 200, alpha: bool = True) -> list[str]:
    out_dir = Path(out_dir)
    out_dir.mkdir(parents=True, exist_ok=True)

    doc = fitz.open(in_path)
    paths = []
    zoom = dpi / 72
    matrix = fitz.Matrix(zoom, zoom)
    for i, page in enumerate(doc):
        pix = page.get_pixmap(matrix=matrix, alpha=alpha)
        out_path = out_dir / f"page-{i+1:03d}.png"
        pix.save(out_path)
        paths.append(str(out_path))
    doc.close()
    return paths

pages = pdf_to_png("document.pdf", "./pages", dpi=200, alpha=True)

PyMuPDF is ~2x faster than pdf2image and ships as a single wheel. License is AGPL — for commercial SaaS distribution you may need a commercial license from Artifex.

Method 3: ChangeThisFile API (no installs)

If you don't want Poppler or PyMuPDF in your environment, the API runs Poppler server-side. Free tier covers 1,000 conversions/month.

import requests
import zipfile
from pathlib import Path

API_KEY = "ctf_sk_your_key_here"

def pdf_to_png(in_path: str, out_dir: str) -> list[str]:
    with open(in_path, "rb") as f:
        response = requests.post(
            "https://changethisfile.com/v1/convert",
            headers={"Authorization": f"Bearer {API_KEY}"},
            files={"file": f},
            data={"source": "pdf", "target": "png"},
            timeout=120,
        )
    response.raise_for_status()

    out_dir = Path(out_dir)
    out_dir.mkdir(parents=True, exist_ok=True)

    if response.headers.get("Content-Type", "").startswith("application/zip"):
        zip_path = out_dir / "pages.zip"
        zip_path.write_bytes(response.content)
        with zipfile.ZipFile(zip_path) as zf:
            zf.extractall(out_dir)
        zip_path.unlink()
        return sorted(str(p) for p in out_dir.glob("*.png"))
    else:
        out_path = out_dir / "page-001.png"
        out_path.write_bytes(response.content)
        return [str(out_path)]

pages = pdf_to_png("document.pdf", "./pages")

The API renders at 150 DPI by default. Pass dpi=300 for OCR-grade output.

When to use each

Approach	Best for	Tradeoff
pdf2image	Self-hosted batch, max fidelity	Requires Poppler (~50MB system dep)
PyMuPDF	Single pip install, no native deps	AGPL license — check before commercial use
ChangeThisFile API	Edge runtimes, no infra ownership	Network call, file size limit (25MB free)

When to use PNG over JPG

Transparency — PDFs with cutout backgrounds (logos, diagrams) keep transparency in PNG; JPG flattens them.
OCR input — JPEG compression artifacts can confuse Tesseract on small text. PNG is lossless.
Diagrams and screenshots — sharp edges and solid colors compress much better in PNG. A diagram PDF page might be 50KB as PNG vs 200KB as JPG.
UI mockups, wireframes — pixel-perfect rendering matters; PNG preserves it.

Use JPG for photo-heavy PDFs (e.g., scanned magazines) — JPG is 2-5x smaller for photographic content.

Production tips

Use optimize=True for served PNGs. Pillow's optimize mode tries multiple compression strategies and picks the smallest. Adds a few hundred ms per page; saves 10-30% on file size.
For OCR, render at 300 DPI. Tesseract's accuracy is highest at 300 DPI on small text. 200 works for body text; 150 is too low for small fonts.
Strip alpha for opaque content. If your PDF doesn't actually use transparency, alpha=False saves space and avoids compositing surprises.
Watch for password-protected PDFs. Pass userpw= to convert_from_path; doc.authenticate(pwd) for PyMuPDF; password= form field for the API.
Parallelize multi-page rendering. thread_count=4 in pdf2image; ThreadPoolExecutor wrapping page.get_pixmap calls in PyMuPDF.

For most projects PyMuPDF is the easiest deploy. For Poppler shops, pdf2image. For environments without either, the API. Free tier covers 1,000 conversions/month.

How to Convert PDF to PNG in Python