HTML-to-PDF splits by what your HTML needs. Static HTML with print-style CSS: WeasyPrint, fast and clean. JavaScript-rendered SPAs or HTML that depends on browser-specific layout: Playwright (Chromium). For server environments without browser binaries, the API.
Method 1: WeasyPrint (pure Python, CSS-aware)
WeasyPrint renders HTML with full CSS support including CSS Paged Media (page-break-before, @page rules, headers/footers). No browser dependency.
pip install weasyprint
# Linux: also need apt install libpango-1.0-0 libpangoft2-1.0-0
# macOS: brew install pango
from weasyprint import HTML, CSS
def html_to_pdf(in_path: str, out_path: str, base_url: str = None) -> None:
HTML(filename=in_path, base_url=base_url).write_pdf(out_path)
# Convert a local file:
html_to_pdf("invoice.html", "invoice.pdf", base_url=".")
# Convert from a string with custom CSS:
def html_string_to_pdf(html: str, out_path: str, css: str = None) -> None:
stylesheets = [CSS(string=css)] if css else []
HTML(string=html).write_pdf(out_path, stylesheets=stylesheets)
html_string_to_pdf(
"Hello
This is a PDF.
",
"hello.pdf",
css="@page { size: A4; margin: 1in } h1 { color: navy }",
)
Three things to know:
- base_url tells WeasyPrint where to resolve relative URLs (images, CSS, fonts). Set to the directory of the HTML file or a server URL.
- CSS Paged Media works — use @page for margins, @top-center for headers, named pages for chapters.
- No JavaScript. WeasyPrint reads static HTML. For SPAs, render to static HTML first or use Playwright.
Method 2: Playwright (Chromium, pixel-perfect)
For modern web apps with JavaScript-rendered content, Playwright drives a real Chromium browser to render and print to PDF.
pip install playwright
playwright install chromium # downloads ~250MB browser
from playwright.sync_api import sync_playwright
def html_to_pdf(url_or_file: str, out_path: str) -> None:
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
if url_or_file.startswith(("http://", "https://")):
page.goto(url_or_file, wait_until="networkidle")
else:
page.goto(f"file://{os.path.abspath(url_or_file)}", wait_until="networkidle")
page.pdf(
path=out_path,
format="A4",
margin={"top": "1in", "bottom": "1in", "left": "1in", "right": "1in"},
print_background=True,
)
browser.close()
import os
html_to_pdf("https://example.com/report", "report.pdf")
Three things to know:
- wait_until='networkidle' waits for all network requests to complete — important for SPAs that load data after initial render.
- print_background=True includes background colors and images. Default is False (matches browser print dialog default).
- For dynamic content, add explicit waits. Use page.wait_for_selector('.content-loaded') if you have specific elements that signal readiness.
Method 3: ChangeThisFile API (no browser, no native libs)
If you don't want WeasyPrint's native deps or Playwright's Chromium download, the API does it. Free tier covers 1,000 conversions/month.
import requests
API_KEY = "ctf_sk_your_key_here"
def html_to_pdf(in_path: str, out_path: str) -> None:
with open(in_path, "rb") as f:
response = requests.post(
"https://changethisfile.com/v1/convert",
headers={"Authorization": f"Bearer {API_KEY}"},
files={"file": f},
data={"source": "html", "target": "pdf"},
timeout=120,
)
response.raise_for_status()
with open(out_path, "wb") as out:
out.write(response.content)
html_to_pdf("invoice.html", "invoice.pdf")
The API uses LibreOffice for HTML rendering — supports static HTML + CSS but not JavaScript. For JS-heavy pages, render to static HTML first (e.g., with Playwright in your build) then send.
When to use each
| Approach | Best for | Tradeoff |
|---|---|---|
| WeasyPrint | Print-style HTML with CSS Paged Media — invoices, reports, books | No JavaScript; native deps (Pango) |
| Playwright (Chromium) | SPAs, JavaScript-rendered content, pixel-match-the-browser | ~250MB Chromium download, slow startup |
| ChangeThisFile API | No browser/native libs in your env | Static HTML only; network call |
Production tips
- For invoices and reports, WeasyPrint is the right answer. CSS Paged Media gives you full control over headers, footers, page breaks, and layout. Faster than Playwright by 5-10x.
- For SPAs, Playwright in a singleton. Don't launch Chromium per request — keep a persistent browser instance and create new pages per conversion. Saves ~2s of startup time per call.
- Use @media print CSS for browser-aware HTML. Most websites have print-specific CSS that hides nav, footers, and ads. WeasyPrint and Playwright both honor @media print.
- Embed fonts explicitly. WeasyPrint uses Pango/fontconfig — install needed fonts (apt install fonts-noto, etc.). Playwright uses Chromium's font set; install custom fonts via @font-face with absolute URLs.
- For batch jobs, parallelize carefully. WeasyPrint releases the GIL — ThreadPoolExecutor works. Playwright's browser instance is single-threaded — use multiple browser contexts or a process pool.
For invoices and reports, WeasyPrint. For SPAs and dynamic content, Playwright. For environments without either, the API. Free tier covers 1,000 conversions/month.