DOCX-to-PDF is one of the harder document conversions. The fidelity problem is real: fonts, complex tables, embedded images, and tracked changes all behave differently across converters. Apache POI reads the DOCX AST but the PDF rendering quality depends on the downstream library. LibreOffice headless produces the most faithful output but requires a system install. The ChangeThisFile API uses LibreOffice server-side without the installation burden.

Method 1: Apache POI + docx4j (JVM, no system deps)

docx4j converts DOCX to PDF via its built-in layout engine. Pure JVM, no LibreOffice required. Fidelity is good for simple to medium-complexity documents.

// build.gradle.kts
dependencies {
    implementation("org.docx4j:docx4j-JAXB-MOXy:11.4.10")
    implementation("org.docx4j:docx4j-export-fo:11.4.10")
    implementation("org.apache.xmlgraphics:fop:2.9")
}
import org.docx4j.Docx4J
import org.docx4j.openpackaging.packages.WordprocessingMLPackage
import java.io.File
import java.io.FileOutputStream

fun docxToPdf(inputPath: String, outputPath: String) {
    val wordMLPackage = WordprocessingMLPackage.load(File(inputPath))
    val os = FileOutputStream(outputPath)
    Docx4J.toPDF(wordMLPackage, os)
    os.close()
    println("Saved: $outputPath")
}

fun main() {
    docxToPdf("document.docx", "document.pdf")
}

docx4j uses Apache FOP (XSL-FO) internally to render the PDF. This means:

  • Complex floating tables and text boxes may not lay out identically to Word.
  • Custom or non-system fonts must be registered with FOP's font config.
  • The conversion is synchronous and CPU-bound — wrap in Dispatchers.Default for concurrent use.

For most business documents (memos, reports, resumes), docx4j output is production-acceptable. For pixel-perfect Word output, use LibreOffice.

Method 2: LibreOffice headless via ProcessBuilder (highest fidelity)

LibreOffice is the gold standard for DOCX → PDF fidelity. It parses Word format natively and its PDF export matches what users see in Word. Call it headless via ProcessBuilder.

# Install LibreOffice
apt install libreoffice    # Debian/Ubuntu
brew install --cask libreoffice  # macOS
import java.io.File
import java.util.concurrent.TimeUnit

fun docxToPdfLibreOffice(
    inputPath: String,
    outputDir: String,
    timeout: Long = 120
): String {
    val outDir = File(outputDir).also { it.mkdirs() }
    val inputFile = File(inputPath)

    val process = ProcessBuilder(
        "libreoffice",
        "--headless",
        "--convert-to", "pdf",
        "--outdir", outDir.absolutePath,
        inputFile.absolutePath
    )
        .redirectErrorStream(true)
        .start()

    val output = process.inputStream.bufferedReader().readText()
    val finished = process.waitFor(timeout, TimeUnit.SECONDS)

    if (!finished) {
        process.destroyForcibly()
        error("LibreOffice timed out after ${timeout}s")
    }
    if (process.exitValue() != 0) {
        error("LibreOffice failed (exit ${process.exitValue()}):\n$output")
    }

    val outputName = inputFile.nameWithoutExtension + ".pdf"
    return File(outDir, outputName).absolutePath
}

fun main() {
    val pdf = docxToPdfLibreOffice("document.docx", "./output")
    println("Saved: $pdf")
}

Important: LibreOffice headless is single-instance by default. Concurrent calls on the same machine queue up or conflict. For concurrent use, set HOME to a per-request temp directory and pass -env:UserInstallation=file:///tmp/lo-XXXX to isolate LibreOffice profiles. This avoids lock file conflicts.

Method 3: ChangeThisFile API via Ktor HttpClient (LibreOffice server-side)

The API runs LibreOffice server-side — same fidelity as Method 2, no installation required. Free tier: 1,000 conversions/month.

# curl reference
curl -X POST https://changethisfile.com/v1/convert \
  -H "Authorization: Bearer ctf_sk_your_key_here" \
  -F "file=@document.docx" \
  -F "target=pdf" \
  --output document.pdf
import io.ktor.client.*
import io.ktor.client.engine.cio.*
import io.ktor.client.request.*
import io.ktor.client.request.forms.*
import io.ktor.client.statement.*
import io.ktor.http.*
import java.io.File

const val API_KEY = "ctf_sk_your_key_here"

suspend fun docxToPdfApi(inputPath: String, outputPath: String) {
    val client = HttpClient(CIO) {
        engine { requestTimeout = 120_000 }
    }
    val response: HttpResponse = client.submitFormWithBinaryData(
        url = "https://changethisfile.com/v1/convert",
        formData = formData {
            append("file", File(inputPath).readBytes(), Headers.build {
                append(HttpHeaders.ContentDisposition, "filename=\"${File(inputPath).name}\"")
                append(HttpHeaders.ContentType, ContentType(
                    "application",
                    "vnd.openxmlformats-officedocument.wordprocessingml.document"
                ))
            })
            append("target", "pdf")
        }
    ) {
        header("Authorization", "Bearer $API_KEY")
    }
    File(outputPath).writeBytes(response.readBytes())
    client.close()
    println("Saved: $outputPath")
}

fun main() = kotlinx.coroutines.runBlocking {
    docxToPdfApi("document.docx", "document.pdf")
}

Source format is auto-detected from file content. For DOC (legacy binary format) or ODT, the same endpoint works — just change the filename accordingly.

When to use each

ApproachBest forTradeoff
docx4j + FOPPure JVM, simple-to-medium documents, no system depsComplex floating elements may not lay out correctly
LibreOffice headlessHighest fidelity, pixel-perfect Word outputSystem install required; single-instance bottleneck
ChangeThisFile API (Ktor)LibreOffice fidelity without installation, serverless JVMNetwork latency, 25MB free-tier upload limit

Production tips

  • Embed fonts in the DOCX before converting. If the DOCX uses non-system fonts (custom brand fonts, Asian CJK characters), embed them in the file or install them on the conversion machine. Missing fonts cause LibreOffice to substitute, breaking layout.
  • For concurrent LibreOffice conversions, isolate profiles. Pass -env:UserInstallation=file:///tmp/lo-<uuid> and set HOME to a temp dir per job. Clean up after conversion. Without isolation, concurrent calls corrupt each other's profile lock files.
  • Set a hard timeout on ProcessBuilder. A malformed DOCX can cause LibreOffice to hang. Use process.waitFor(120, TimeUnit.SECONDS) and call process.destroyForcibly() on timeout.
  • Check output file size. LibreOffice exits with code 0 even if the DOCX couldn't be fully parsed. Verify that the output PDF is non-empty (> 1KB) before returning success to the caller.
  • docx4j font config. For custom fonts with docx4j/FOP, configure font directories in a fop-config.xml and pass it via FopFactory.newInstance(new File("fop-config.xml").toURI()).

For pure-JVM deployments with simple documents, docx4j is the easy path. For production pipelines that need Word-accurate output, LibreOffice headless (with profile isolation) is the right call. For LibreOffice-quality output without managing the installation, the API. Free tier covers 1,000 conversions/month.