Every file you create today is a bet on a format's future. If you save a document as DOCX, you're betting that software capable of reading DOCX will exist when you need it in 2046. If you archive photos as PSD, you're betting that Adobe Photoshop or a compatible reader will be available. If you store video as WMV, you've already lost that bet.
Digital preservation is the discipline of choosing formats, storage media, and practices that maximize the likelihood of files remaining accessible over decades. Libraries, archives, museums, and government agencies have been grappling with this since the 1990s, and their recommendations are directly applicable to anyone with files they care about keeping.
This guide covers which formats have the best preservation characteristics, why open specifications matter more than technical features, and the practical strategies for ensuring your files survive.
What Makes a Format Archival
The Library of Congress (LoC), the National Archives (NARA), and the Digital Preservation Coalition evaluate formats on seven criteria:
| Criterion | Why It Matters |
|---|---|
| Open specification | If the spec is public, anyone can build a reader. No dependency on one company. |
| Non-proprietary | Formats controlled by one company die when the company does (or pivots). |
| Wide adoption | Popular formats have more tools, more expertise, and more institutional investment in their survival. |
| Self-documenting | Formats that describe their own structure (headers, metadata) are easier to recover from partial corruption. |
| No DRM | DRM-encumbered files may become inaccessible when the license server disappears. |
| Lossless option | Lossy formats permanently discard information. For preservation, lossless is strongly preferred. |
| Backward compatibility | New versions of the format should be able to read old versions. PNG from 1996 works identically in 2026. |
Documents: PDF/A
PDF/A (ISO 19005, first published 2005) is the only document format specifically designed for long-term archival. It's a constrained subset of PDF that removes features that could break future accessibility:
- No JavaScript (scripts may not execute in future readers)
- No external dependencies (all fonts must be embedded; no linked images or resources)
- No encryption (encrypted files become inaccessible if the password is lost)
- Required metadata in XMP format
- Required embedded ICC color profiles
PDF/A has four conformance levels: PDF/A-1 (2005, based on PDF 1.4), PDF/A-2 (2011, based on PDF 1.7, adds JPEG 2000 and transparency), PDF/A-3 (2012, allows embedded file attachments), and PDF/A-4 (2020, based on PDF 2.0).
PDF/A is mandated for legal documents, government records, and court filings in many jurisdictions. The EU's eIDAS regulation references PDF/A. U.S. federal courts require PDF/A for electronic filing.
Convert DOCX to PDF for archival — though for true PDF/A conformance, use a tool that specifically targets PDF/A output (LibreOffice can export as PDF/A-1 and PDF/A-2). Standard PDF conversion may not meet all PDF/A requirements.
The Unkillable Format: Plain Text
Plain text (UTF-8 encoded) is the most durable file format in existence. ASCII text from 1963 is still perfectly readable. Text files have no dependencies — no fonts, no rendering engines, no codecs. They're readable by literally every computing device ever made.
For any content that doesn't require formatting — source code, configuration files, notes, data records, logs — plain text is the archival gold standard. Convert documents to TXT when the content matters more than the formatting. Markdown (DOCX to Markdown) adds minimal formatting while remaining human-readable text.
Images: TIFF and PNG
TIFF (Tag Image File Format, Adobe/Aldus, 1986) is the Library of Congress's recommended format for raster image preservation. It supports lossless compression (LZW, ZIP, or uncompressed), 16-bit per channel color, ICC color profiles, and multi-page documents. TIFF's tagged chunk structure is self-documenting — even partial or damaged files can often be partially recovered.
TIFF is universally supported by professional image software and has been stable for decades. The format hasn't had a major revision since TIFF 6.0 in 1992 — this stability is a feature, not a limitation. A TIFF created in 1992 opens identically in 2026.
PNG (W3C, 1996) is the web-friendly archival alternative. It's lossless, open, widely supported, and smaller than uncompressed TIFF (though larger than LZW-compressed TIFF for photographic content). PNG is a strong choice when TIFF is overkill or when files need to be web-accessible.
For archival: convert JPEG to TIFF (note: this won't restore quality lost to JPEG compression — it just preserves what remains in a lossless container). Convert BMP to PNG for lossless size reduction. Convert PSD to TIFF to escape Photoshop's proprietary format while preserving quality.
What About Camera RAW?
Camera RAW files (CR2, NEF, ARW, DNG) contain the sensor's unprocessed data and are the highest-quality source available. But CR2 (Canon), NEF (Nikon), and ARW (Sony) are proprietary formats that depend on each manufacturer's continued support.
DNG (Digital Negative, Adobe, 2004) is an open RAW format designed for archival. Adobe published the specification and committed to it as a preservation format. The Library of Congress lists DNG as a preferred format for digital photo archival. Convert proprietary RAW to DNG using Adobe's free DNG Converter, or convert to TIFF for maximum safety: CR2 to TIFF | NEF to TIFF | ARW to TIFF.
Audio: WAV and FLAC
WAV (RIFF WAVE, Microsoft/IBM, 1991) stores uncompressed PCM audio. It's the simplest, most universally supported audio format. CD-quality WAV (44.1 kHz, 16-bit stereo) is 10.1 MB per minute. High-resolution WAV (96 kHz, 24-bit) is 34.6 MB per minute. WAV's simplicity is its preservation strength — the format is trivial to parse and has no codec dependencies.
FLAC (Free Lossless Audio Codec, Josh Coalson, 2001) compresses audio to 50-60% of WAV size with bit-perfect reconstruction. FLAC is open-source, royalty-free, and widely supported. The Library of Congress accepts FLAC as a preservation format alongside WAV.
The choice between WAV and FLAC for archival: WAV is simpler and needs no decompression step (lower future risk), FLAC is 50% smaller with the same content (saves storage). Both are excellent choices. Neither MP3 nor AAC should be used for preservation — they're lossy and discard data permanently.
Convert MP3 to WAV for archival (preserves the already-degraded audio in a lossless container). Convert WAV to FLAC to save storage without quality loss.
Video: MKV with FFV1, or Uncompressed
Video preservation is the hardest problem because of the sheer data volume. One hour of uncompressed 1080p video is approximately 560 GB. Even institutions with large budgets can't store everything uncompressed.
FFV1 (FF Video Codec 1) is a lossless video codec developed by Michael Niedermayer for FFmpeg and standardized by IETF. The Library of Congress and the International Association of Sound and Audiovisual Archives (IASA) recommend FFV1 in MKV containers for video preservation. FFV1 achieves 2:1 to 3:1 compression losslessly — that 560 GB hour becomes 200-280 GB.
MKV (Matroska) is the recommended container because it's open, well-documented, and supports any codec plus multiple audio tracks, subtitle tracks, and chapters. The combination of MKV + FFV1 + FLAC audio is the current institutional standard for video preservation.
For practical archival (personal collections, small organizations), H.264 in MP4 at high bitrate (CRF 18 or lower) is a reasonable compromise — not lossless, but visually transparent and 10-20x smaller than lossless. Convert legacy AVI to MKV for a modern, preservation-friendly container.
Ebooks: EPUB and PDF
EPUB (International Digital Publishing Forum, 2007) is an open ebook format based on XHTML and CSS. It's reflowable (adapts to screen size), supports embedded fonts and images, and is widely supported by readers. EPUB is the recommended format for text-centric ebooks.
PDF is better for fixed-layout content (textbooks with complex layouts, illustrated books, technical manuals). For archival, use PDF/A to ensure long-term accessibility.
Avoid proprietary ebook formats for archival: Kindle's KFX, Apple's iBooks format, and any DRM-protected format. DRM-protected files become inaccessible when the license server disappears — and it will eventually disappear. Convert EPUB to PDF for fixed-layout preservation, or keep both formats for flexibility.
Formats to Avoid for Preservation
| Format | Risk | Migrate To |
|---|---|---|
| PSD (Photoshop) | Proprietary, depends on Adobe | TIFF or PNG |
| DOC (Word 97-2003) | Binary, underdocumented, declining support | DOCX then PDF/A |
| WMA/WMV | Proprietary Microsoft, declining support | FLAC / MP4 |
| HEIC | Patent-encumbered (HEVC), uncertain licensing future | PNG or TIFF |
| RAR | Proprietary (Alexander Roshal), closed source | ZIP or 7Z |
| MOBI/AZW3 | Amazon proprietary, DRM possible | EPUB |
| Any DRM-protected format | Inaccessible when license server dies | Remove DRM (where legal) and save as open format |
A Practical Preservation Strategy
- Keep originals. Never delete the camera RAW, the master recording, or the source document. Store them alongside any converted versions.
- Convert to archival formats. Create archival copies in open, lossless formats: TIFF/PNG for images, WAV/FLAC for audio, PDF/A for documents, EPUB for ebooks.
- Use the 3-2-1 backup rule. Three copies, on two different types of storage media, with one copy offsite. Format durability means nothing if all copies are on one hard drive that fails.
- Embed metadata. Add title, date, description, and creator to every file. Metadata is the only context that travels with the file itself.
- Revisit every 5-10 years. Check that your files are still readable with current tools. Migrate to newer open formats if the current one is losing support. This is cheaper than emergency recovery after a format becomes truly obsolete.
- Document your file organization. A plain text README in each archive directory explaining what's in it and how it's organized costs nothing and helps immensely.
Digital preservation comes down to a simple principle: use formats that don't require any specific company to exist. PDF/A will be readable as long as the ISO exists. TIFF has been stable since 1992 and will be parseable for decades more. FLAC's specification is public and implementations are open-source. Plain text will outlast civilization.
The practical cost of preservation is small: convert important files to archival formats, keep the originals alongside them, and check every few years that everything still opens. The cost of not doing this — realizing that your 20-year photo archive is in a format no current tool can read — is enormous and irreversible.