Converting DOCX to PDF preserving formatting is a notoriously hard problem because Microsoft Word's rendering is the de facto reference. Open formats like DOCX are well-specified, but the visual output depends on which fonts are installed, how the renderer handles mixed content, and a long tail of quirks. Free Python options exist, but they all have meaningful tradeoffs.
This guide shows three working approaches with cross-platform notes and the formatting fidelity each one delivers.
Method 1: ChangeThisFile API (works anywhere)
The API uses LibreOffice headless on its servers, which gives you the best free renderer without the install pain. Get a free API key — 1,000 conversions/month on the free tier.
import requests
API_KEY = "sk_test_your_key_here"
def docx_to_pdf(docx_path: str, output_path: str) -> None:
with open(docx_path, "rb") as f:
response = requests.post(
"https://changethisfile.com/v1/convert",
headers={"Authorization": f"Bearer {API_KEY}"},
files={"file": f},
data={"source": "docx", "target": "pdf"},
timeout=120,
)
response.raise_for_status()
with open(output_path, "wb") as out:
out.write(response.content)
docx_to_pdf("contract.docx", "contract.pdf")
For batch conversion, parallelize with a thread pool — the API handles concurrent requests well within your rate limit:
from concurrent.futures import ThreadPoolExecutor
import os
files = [f for f in os.listdir("docs") if f.endswith(".docx")]
def convert(filename):
docx_to_pdf(f"docs/{filename}", f"out/{filename.replace('.docx', '.pdf')}")
with ThreadPoolExecutor(max_workers=8) as pool:
pool.map(convert, files)
Method 2: docx2pdf (uses Microsoft Word)
docx2pdf shells out to Microsoft Word via COM automation on Windows or AppleScript on macOS. Output fidelity is perfect because it is literally Word doing the rendering. The catch: Linux is not supported, and you need Word installed.
pip install docx2pdf
from docx2pdf import convert
convert("contract.docx", "contract.pdf")
# Or batch convert a directory:
convert("docs/") # converts all .docx files in docs/
Use docx2pdf when you are on a Windows or macOS workstation, you have Word installed, and you need 1:1 formatting fidelity (legal documents, regulatory filings). Do not use it in production servers — Word automation is fragile and slow.
Method 3: LibreOffice headless (cross-platform, free)
LibreOffice can run in headless mode and convert DOCX to PDF from the command line. This works on Linux, macOS, and Windows.
apt-get install libreoffice # or: brew install libreoffice
import subprocess
import os
def docx_to_pdf(docx_path: str, output_dir: str) -> str:
result = subprocess.run([
"libreoffice", "--headless",
"--convert-to", "pdf",
"--outdir", output_dir,
docx_path,
], capture_output=True, timeout=120)
if result.returncode != 0:
raise RuntimeError(f"LibreOffice failed: {result.stderr.decode()}")
base = os.path.splitext(os.path.basename(docx_path))[0]
return os.path.join(output_dir, f"{base}.pdf")
pdf_path = docx_to_pdf("contract.docx", "out/")
Two important constraints with headless LibreOffice:
- Single-instance bottleneck. LibreOffice headless can only run one conversion at a time per user session. Concurrent conversions queue. Spinning up multiple processes does not parallelize cleanly because they fight over the same user profile directory.
- Slow startup. The first conversion in a process takes 5-10 seconds just to spin up LibreOffice. Subsequent conversions in the same long-running process are faster.
If you need throughput, use the API instead — it pre-warms LibreOffice instances behind a queue, so individual conversions return faster.
Formatting fidelity comparison
| Approach | Fidelity | Quirks |
|---|---|---|
| ChangeThisFile API | High (LibreOffice on server with full font set) | Custom fonts may fall back; embed fonts in source DOCX for guarantee |
| docx2pdf (Word) | Perfect (1:1 with Word) | Windows/macOS only, requires Word license |
| LibreOffice local | High (varies with installed fonts) | Slow startup, single-instance bottleneck |
The most common fidelity issue is fonts. If your DOCX uses Calibri (Microsoft's default) and the renderer doesn't have Calibri installed, it falls back to a similar font and pagination shifts. The fix is to embed fonts in the DOCX before conversion (Word: File > Options > Save > Embed fonts in the file), or set explicit font fallbacks in the document style.
Production tips
- Set realistic timeouts. A 100-page DOCX with images can take 30-60 seconds to convert. Use 120s timeout minimum.
- Validate input before sending. Empty DOCX files, password-protected DOCX, and corrupt OOXML all fail conversion. Test with python-docx.Document() locally before hitting the API.
- Handle the 502 case. Conversions can fail on edge-case DOCX (heavily nested tables, embedded VBA macros, broken styles). Surface the error to users with a clear message.
- Cache results. If users convert the same DOCX repeatedly, hash the file and cache the PDF. Saves money and latency.
For one-off conversions on your own machine, install LibreOffice and shell out — it's free and good enough. For SaaS uploads, the API saves you from packaging LibreOffice in your Docker image and dealing with its headless quirks. Get a free API key with 1,000 conversions/month.