JavaScript PDF-to-DOCX has weaker pure-JS options than Python. The browser path (PDF.js + docx) extracts text but loses layout precision. Node + LibreOffice is much closer to fidelity. For most production use cases, the API is the simpler answer because pure-JS PDF parsing produces inconsistent output across PDF variants.

Method 1: PDF.js + docx (browser, basic)

This works for text-only PDFs. Extract text with PDF.js, build a DOCX with the docx library. Layout precision is limited.

npm install pdfjs-dist docx file-saver
import * as pdfjsLib from "pdfjs-dist/build/pdf";
import { Document, Paragraph, Packer, TextRun } from "docx";
import { saveAs } from "file-saver";

pdfjsLib.GlobalWorkerOptions.workerSrc = "/pdf.worker.min.js";

async function pdfToDocx(pdfFile) {
  const arrayBuffer = await pdfFile.arrayBuffer();
  const pdf = await pdfjsLib.getDocument({ data: arrayBuffer }).promise;

  const paragraphs = [];
  for (let i = 1; i <= pdf.numPages; i++) {
    const page = await pdf.getPage(i);
    const text = await page.getTextContent();

    let lineText = "";
    let lastY = null;
    for (const item of text.items) {
      if (lastY !== null && Math.abs(item.transform[5] - lastY) > 5) {
        // New line
        if (lineText.trim()) {
          paragraphs.push(new Paragraph({ children: [new TextRun(lineText)] }));
        }
        lineText = item.str;
      } else {
        lineText += item.str + " ";
      }
      lastY = item.transform[5];
    }
    if (lineText.trim()) {
      paragraphs.push(new Paragraph({ children: [new TextRun(lineText)] }));
    }
    paragraphs.push(new Paragraph({ children: [new TextRun("")] })); // page break gap
  }

  const doc = new Document({
    sections: [{ children: paragraphs }],
  });

  const blob = await Packer.toBlob(doc);
  saveAs(blob, "output.docx");
}

document.querySelector("input[type=file]").addEventListener("change", (e) => {
  pdfToDocx(e.target.files[0]);
});

This produces a DOCX with the text content but loses tables, images, and complex layout. For text-heavy documents (essays, reports without tables), it's serviceable. For anything with structure, use method 2 or 3.

Method 2: LibreOffice via child_process (Node)

For higher-fidelity conversion in Node, shell out to LibreOffice. Same approach as Python but from JavaScript.

apt install libreoffice --no-install-recommends
import { spawn } from "node:child_process";
import path from "node:path";

function pdfToDocx(inPath, outDir, timeoutMs = 120000) {
  return new Promise((resolve, reject) => {
    const child = spawn(
      "libreoffice",
      [
        "--headless",
        "--infilter=writer_pdf_import",
        "--convert-to", "docx",
        "--outdir", outDir,
        inPath,
      ],
      { env: { ...process.env, HOME: "/tmp" } }
    );

    const timer = setTimeout(() => {
      child.kill("SIGKILL");
      reject(new Error("libreoffice timed out"));
    }, timeoutMs);

    let stderr = "";
    child.stderr.on("data", (chunk) => (stderr += chunk));

    child.on("close", (code) => {
      clearTimeout(timer);
      if (code !== 0) {
        reject(new Error(`libreoffice exit ${code}: ${stderr}`));
        return;
      }
      const base = path.basename(inPath, path.extname(inPath));
      resolve(path.join(outDir, `${base}.docx`));
    });
  });
}

const out = await pdfToDocx("document.pdf", "./out");
console.log("wrote:", out);

Three things to know:

  • HOME=/tmp in containers — LibreOffice creates a profile dir on first run.
  • writer_pdf_import filter tells LibreOffice the input is editable PDF.
  • Single-threaded per host. Use a buffered queue to bound concurrency to one.

Method 3: ChangeThisFile API (with OCR fallback)

The API runs LibreOffice server-side and falls back to OCR for image-only PDFs. Free tier covers 1,000 conversions/month.

const API_KEY = "ctf_sk_your_key_here";

async function pdfToDocx(pdfBuffer, filename = "document.pdf") {
  const form = new FormData();
  form.append("file", new Blob([pdfBuffer], { type: "application/pdf" }), filename);
  form.append("source", "pdf");
  form.append("target", "docx");

  const response = await fetch("https://changethisfile.com/v1/convert", {
    method: "POST",
    headers: { Authorization: `Bearer ${API_KEY}` },
    body: form,
  });

  if (!response.ok) throw new Error(`HTTP ${response.status}: ${await response.text()}`);
  return await response.arrayBuffer();
}

// Cloudflare Worker example:
export default {
  async fetch(request) {
    const pdf = await request.arrayBuffer();
    const docx = await pdfToDocx(new Uint8Array(pdf));
    return new Response(docx, {
      headers: {
        "Content-Type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
      },
    });
  },
};

For text-layer PDFs, the API uses LibreOffice (instant). For scanned PDFs, it runs OCR first then constructs the DOCX. Set timeout to 180s+ for large or scanned documents.

When to use each

ApproachBest forTradeoff
PDF.js + docx (browser)Text-only PDFs, privacy-first conversionLoses tables, images, complex layout
LibreOffice via NodeHigher fidelity, server-side batch1GB install, single-threaded per host
ChangeThisFile APIMixed input including scans, no infraNetwork call, file size limit (25MB free)

Production tips

  • Be honest about pure-JS fidelity. The browser path produces text-only DOCX. If users expect tables, images, and layout to transfer, you need server-side conversion.
  • For Node + LibreOffice, use a job queue. LibreOffice serializes internally — multiple concurrent processes contend for locks. A simple BullMQ queue or in-memory semaphore is enough.
  • Set timeout 180s+ for large PDFs. Long documents with images take 30-60s to convert; complex layouts longer.
  • Detect scanned PDFs early. Run pdftotext first; if output is tiny, the PDF is image-only and pure conversion will produce empty DOCX. Use OCR (the API does this automatically).
  • Lazy-load PDF.js. The pdfjs-dist bundle is ~1MB. For pages that only sometimes convert PDFs, dynamic import to keep initial bundle small.

For text-heavy PDFs in the browser, PDF.js + docx works. For real production use, Node + LibreOffice or the API. Free tier covers 1,000 conversions/month.