Can I convert DOC to DOCX without Microsoft Word?

Yes — LibreOffice handles the conversion without Word installed. Fidelity is 95%+ for standard documents. Complex formatting, custom themes, and embedded charts may look slightly different.

What happens to embedded images in DOC files?

Embedded images are preserved in both DOCX and PDF output. LibreOffice extracts and re-embeds them in the target format.

Can I convert WPS files (MS Works)?

LibreOffice has basic WPS support but it's hit-or-miss for older files. The API uses the same LibreOffice backend. For WPS files that fail, try opening them in LibreOffice GUI (which has better error recovery than headless mode).

Why does my converted DOCX look different from the original DOC?

Font substitution is the most common cause. The DOC was created with Windows-specific fonts that aren't on your Linux conversion server. Install the Microsoft core fonts (apt install ttf-mscorefonts-installer) to fix most cases.

How to Convert Old DOC and RTF Files to DOCX or PDF

Document archives from the 1990s-2000s are full of .doc files, .rtf files, and occasionally .wps (WordPerfect) and .wpd formats. Microsoft Word can open most of these, but opening 500 files manually to save as DOCX is not a workflow. LibreOffice headless + a 10-line Python script converts an entire directory in minutes.

TL;DR

CLI one-liner: libreoffice --headless --convert-to docx *.doc
Python batch: subprocess loop calling LibreOffice on each file
No LibreOffice installed: ChangeThisFile API converts DOC/RTF/WPS server-side
Fidelity note: Complex formatting (tracked changes, custom styles, macros) may not survive DOC→DOCX conversion perfectly

Legacy document formats you'll encounter

Format	Extension	Era	Conversion support
Word 97-2003	.doc	1997-2007	Excellent (LibreOffice, Word)
Rich Text Format	.rtf	1987-present	Excellent
WordPad (modern)	.rtf	Windows built-in	Excellent
WordPerfect	.wpd	1980s-2000s	Good (LibreOffice)
MS Works	.wps	1987-2007	Partial (LibreOffice)
OpenDocument	.odt	2005-present	Excellent (native LibreOffice)

LibreOffice CLI: batch conversion

# Install LibreOffice
apt install libreoffice   # Debian/Ubuntu
brew install libreoffice  # macOS (Homebrew Cask)

# Convert all DOC files to DOCX in current directory
libreoffice --headless --convert-to docx *.doc

# Convert to PDF
libreoffice --headless --convert-to pdf *.doc
libreoffice --headless --convert-to pdf *.rtf

# Specify output directory
libreoffice --headless --convert-to docx --outdir ./converted *.doc *.rtf

# Convert everything at once (doc, rtf, odt, wpd)
libreoffice --headless --convert-to pdf --outdir ./pdf-output *.doc *.rtf *.odt *.wpd

LibreOffice headless is single-threaded for conversion — concurrent calls from multiple processes actually slow it down due to the LibreOffice instance lock. Run conversions sequentially in a loop, not in parallel.

Python: batch migration with error tracking

import subprocess
import shutil
from pathlib import Path
from dataclasses import dataclass

@dataclass
class ConversionResult:
    src: Path
    success: bool
    output: Path = None
    error: str = None

def convert_doc(
    src: Path,
    out_dir: Path,
    target_format: str = "docx",  # docx, pdf, odt, html
) -> ConversionResult:
    if not shutil.which("libreoffice"):
        raise RuntimeError("LibreOffice not found. Install with: apt install libreoffice")

    result = subprocess.run(
        [
            "libreoffice",
            "--headless",
            "--convert-to", target_format,
            "--outdir", str(out_dir),
            str(src),
        ],
        capture_output=True,
        text=True,
        timeout=120,
    )

    expected_out = out_dir / src.with_suffix(f".{target_format}").name
    if result.returncode == 0 and expected_out.exists():
        return ConversionResult(src=src, success=True, output=expected_out)
    else:
        return ConversionResult(
            src=src, success=False,
            error=result.stderr[-300:] or "Unknown error"
        )

def batch_migrate(
    input_dir: str,
    output_dir: str,
    target: str = "docx",
    extensions: tuple = (".doc", ".rtf", ".wpd", ".odt", ".wps"),
):
    src_dir = Path(input_dir)
    out_dir = Path(output_dir)
    out_dir.mkdir(parents=True, exist_ok=True)

    docs = [p for p in src_dir.rglob("*") if p.suffix.lower() in extensions]
    print(f"Converting {len(docs)} documents to {target}")

    success, failed = [], []
    for i, doc in enumerate(docs, 1):
        print(f"[{i}/{len(docs)}] {doc.name}...", end=" ", flush=True)
        r = convert_doc(doc, out_dir, target)
        if r.success:
            print(f"OK ({r.output.stat().st_size // 1024}KB)")
            success.append(r)
        else:
            print(f"FAILED: {r.error[:80]}")
            failed.append(r)

    print(f"\n{len(success)} succeeded, {len(failed)} failed")
    if failed:
        print("\nFailed files:")
        for r in failed:
            print(f"  {r.src}: {r.error[:80]}")
    return success, failed

succeed, fail = batch_migrate("./old-docs", "./converted", target="docx")

ChangeThisFile API

# DOC to DOCX
curl -X POST https://changethisfile.com/v1/convert \
  -H "Authorization: Bearer ctf_sk_your_key_here" \
  -F "file=@document.doc" \
  -F "target=docx" \
  --output document.docx

# RTF to PDF
curl -X POST https://changethisfile.com/v1/convert \
  -H "Authorization: Bearer ctf_sk_your_key_here" \
  -F "file=@memo.rtf" \
  -F "target=pdf" \
  --output memo.pdf

import requests
from pathlib import Path

API_KEY = "ctf_sk_your_key_here"

def convert_via_api(src: str, out: str, target: str = "docx") -> None:
    with open(src, "rb") as f:
        resp = requests.post(
            "https://changethisfile.com/v1/convert",
            headers={"Authorization": f"Bearer {API_KEY}"},
            files={"file": f},
            data={"target": target},
            timeout=120,
        )
    resp.raise_for_status()
    Path(out).write_bytes(resp.content)

convert_via_api("legacy-contract.doc", "contract.pdf", "pdf")

Edge cases and gotchas

Macros won't survive. DOC files often contain VBA macros. LibreOffice can read basic macros but DOCX doesn't use VBA — macros are dropped. If you need macros, keep the .doc alongside the converted .docx.
Custom fonts missing. DOC files from the 90s often used fonts that aren't installed on your system (Arial Narrow, Book Antiqua, etc.). LibreOffice substitutes the closest available font — layout may shift slightly.
Tracked changes. LibreOffice preserves tracked changes when converting DOC→DOCX. However, the revision author names and timestamps may be garbled from old proprietary metadata.
Very old binary DOC format (pre-Word 97). Word 2.0 and Word 6.0 .doc files use a different binary format. LibreOffice can open most but some very old files fail. Microsoft's free Word Viewer (Windows only) is the fallback.
Encoding issues in RTF files. Old RTF files from non-English Windows systems may use Windows-1252 or other encodings. LibreOffice usually detects this automatically, but garbled characters indicate a wrong encoding assumption.

Migrating a large document archive

For archives of 10,000+ files, a few optimizations:

# Find all legacy docs and convert to PDF in one command
find /path/to/archive -name '*.doc' -o -name '*.rtf' | \
  xargs -I{} libreoffice --headless --convert-to pdf --outdir /path/to/output {}

# Note: xargs parallelism (-P4) with LibreOffice often causes lock conflicts.
# Sequential is more reliable: use a for loop or Python subprocess.

For throughput, split the file list into chunks and run one LibreOffice process per chunk sequentially. A typical server converts ~60-120 DOC files per minute depending on file complexity.

LibreOffice headless is the most capable free tool for legacy document conversion. The Python batch script above handles a 1,000-document archive in about 15 minutes and gives you a clean list of any files that need manual attention. API free tier for no-install conversion.