Document archives from the 1990s-2000s are full of .doc files, .rtf files, and occasionally .wps (WordPerfect) and .wpd formats. Microsoft Word can open most of these, but opening 500 files manually to save as DOCX is not a workflow. LibreOffice headless + a 10-line Python script converts an entire directory in minutes.

TL;DR

  • CLI one-liner: libreoffice --headless --convert-to docx *.doc
  • Python batch: subprocess loop calling LibreOffice on each file
  • No LibreOffice installed: ChangeThisFile API converts DOC/RTF/WPS server-side
  • Fidelity note: Complex formatting (tracked changes, custom styles, macros) may not survive DOC→DOCX conversion perfectly

Legacy document formats you'll encounter

FormatExtensionEraConversion support
Word 97-2003.doc1997-2007Excellent (LibreOffice, Word)
Rich Text Format.rtf1987-presentExcellent
WordPad (modern).rtfWindows built-inExcellent
WordPerfect.wpd1980s-2000sGood (LibreOffice)
MS Works.wps1987-2007Partial (LibreOffice)
OpenDocument.odt2005-presentExcellent (native LibreOffice)

LibreOffice CLI: batch conversion

# Install LibreOffice
apt install libreoffice   # Debian/Ubuntu
brew install libreoffice  # macOS (Homebrew Cask)

# Convert all DOC files to DOCX in current directory
libreoffice --headless --convert-to docx *.doc

# Convert to PDF
libreoffice --headless --convert-to pdf *.doc
libreoffice --headless --convert-to pdf *.rtf

# Specify output directory
libreoffice --headless --convert-to docx --outdir ./converted *.doc *.rtf

# Convert everything at once (doc, rtf, odt, wpd)
libreoffice --headless --convert-to pdf --outdir ./pdf-output *.doc *.rtf *.odt *.wpd

LibreOffice headless is single-threaded for conversion — concurrent calls from multiple processes actually slow it down due to the LibreOffice instance lock. Run conversions sequentially in a loop, not in parallel.

Python: batch migration with error tracking

import subprocess
import shutil
from pathlib import Path
from dataclasses import dataclass

@dataclass
class ConversionResult:
    src: Path
    success: bool
    output: Path = None
    error: str = None

def convert_doc(
    src: Path,
    out_dir: Path,
    target_format: str = "docx",  # docx, pdf, odt, html
) -> ConversionResult:
    if not shutil.which("libreoffice"):
        raise RuntimeError("LibreOffice not found. Install with: apt install libreoffice")

    result = subprocess.run(
        [
            "libreoffice",
            "--headless",
            "--convert-to", target_format,
            "--outdir", str(out_dir),
            str(src),
        ],
        capture_output=True,
        text=True,
        timeout=120,
    )

    expected_out = out_dir / src.with_suffix(f".{target_format}").name
    if result.returncode == 0 and expected_out.exists():
        return ConversionResult(src=src, success=True, output=expected_out)
    else:
        return ConversionResult(
            src=src, success=False,
            error=result.stderr[-300:] or "Unknown error"
        )

def batch_migrate(
    input_dir: str,
    output_dir: str,
    target: str = "docx",
    extensions: tuple = (".doc", ".rtf", ".wpd", ".odt", ".wps"),
):
    src_dir = Path(input_dir)
    out_dir = Path(output_dir)
    out_dir.mkdir(parents=True, exist_ok=True)

    docs = [p for p in src_dir.rglob("*") if p.suffix.lower() in extensions]
    print(f"Converting {len(docs)} documents to {target}")

    success, failed = [], []
    for i, doc in enumerate(docs, 1):
        print(f"[{i}/{len(docs)}] {doc.name}...", end=" ", flush=True)
        r = convert_doc(doc, out_dir, target)
        if r.success:
            print(f"OK ({r.output.stat().st_size // 1024}KB)")
            success.append(r)
        else:
            print(f"FAILED: {r.error[:80]}")
            failed.append(r)

    print(f"\n{len(success)} succeeded, {len(failed)} failed")
    if failed:
        print("\nFailed files:")
        for r in failed:
            print(f"  {r.src}: {r.error[:80]}")
    return success, failed

succeed, fail = batch_migrate("./old-docs", "./converted", target="docx")

ChangeThisFile API

# DOC to DOCX
curl -X POST https://changethisfile.com/v1/convert \
  -H "Authorization: Bearer ctf_sk_your_key_here" \
  -F "file=@document.doc" \
  -F "target=docx" \
  --output document.docx

# RTF to PDF
curl -X POST https://changethisfile.com/v1/convert \
  -H "Authorization: Bearer ctf_sk_your_key_here" \
  -F "file=@memo.rtf" \
  -F "target=pdf" \
  --output memo.pdf
import requests
from pathlib import Path

API_KEY = "ctf_sk_your_key_here"

def convert_via_api(src: str, out: str, target: str = "docx") -> None:
    with open(src, "rb") as f:
        resp = requests.post(
            "https://changethisfile.com/v1/convert",
            headers={"Authorization": f"Bearer {API_KEY}"},
            files={"file": f},
            data={"target": target},
            timeout=120,
        )
    resp.raise_for_status()
    Path(out).write_bytes(resp.content)

convert_via_api("legacy-contract.doc", "contract.pdf", "pdf")

Edge cases and gotchas

  • Macros won't survive. DOC files often contain VBA macros. LibreOffice can read basic macros but DOCX doesn't use VBA — macros are dropped. If you need macros, keep the .doc alongside the converted .docx.
  • Custom fonts missing. DOC files from the 90s often used fonts that aren't installed on your system (Arial Narrow, Book Antiqua, etc.). LibreOffice substitutes the closest available font — layout may shift slightly.
  • Tracked changes. LibreOffice preserves tracked changes when converting DOC→DOCX. However, the revision author names and timestamps may be garbled from old proprietary metadata.
  • Very old binary DOC format (pre-Word 97). Word 2.0 and Word 6.0 .doc files use a different binary format. LibreOffice can open most but some very old files fail. Microsoft's free Word Viewer (Windows only) is the fallback.
  • Encoding issues in RTF files. Old RTF files from non-English Windows systems may use Windows-1252 or other encodings. LibreOffice usually detects this automatically, but garbled characters indicate a wrong encoding assumption.

Migrating a large document archive

For archives of 10,000+ files, a few optimizations:

# Find all legacy docs and convert to PDF in one command
find /path/to/archive -name '*.doc' -o -name '*.rtf' | \
  xargs -I{} libreoffice --headless --convert-to pdf --outdir /path/to/output {}

# Note: xargs parallelism (-P4) with LibreOffice often causes lock conflicts.
# Sequential is more reliable: use a for loop or Python subprocess.

For throughput, split the file list into chunks and run one LibreOffice process per chunk sequentially. A typical server converts ~60-120 DOC files per minute depending on file complexity.

LibreOffice headless is the most capable free tool for legacy document conversion. The Python batch script above handles a 1,000-document archive in about 15 minutes and gives you a clean list of any files that need manual attention. API free tier for no-install conversion.