Every file you've ever compressed — whether a JPEG photo, an MP3 song, or a ZIP archive — used one of two fundamentally different strategies: throw away data the recipient won't notice (lossy), or find a cleverer way to describe the same data using fewer bits (lossless). These aren't just implementation details. They determine whether your file can survive a round-trip, whether re-encoding destroys quality, and whether your archive will be bit-identical in 50 years.
This guide covers the theory and practice of both approaches. We'll start with information theory — the mathematical foundation that makes compression possible at all — then walk through how lossy and lossless codecs actually work, with real examples from image, audio, and video formats. If you've ever wondered why a JPEG gets blurry when you re-save it, or why a FLAC file is half the size of a WAV, this is the explanation.
Understanding compression isn't just academic. It directly affects which format you choose for every file you create, share, or archive. Make the wrong call and you either waste storage on unnecessarily large files or destroy quality you can't get back.
Information Theory: Why Compression Works at All
In 1948, Claude Shannon published "A Mathematical Theory of Communication" at Bell Labs and created the field of information theory. The core insight: data contains entropy (irreducible information) and redundancy (predictable patterns). Compression exploits redundancy.
Consider the string "AAAAAABBBBCCCC" — 14 characters, but only three unique symbols with obvious runs. Run-length encoding compresses this to "6A4B4C" — 6 characters representing the same information. Shannon showed that every data source has a theoretical minimum description length (the entropy rate), and no lossless compressor can beat it.
Real-world data is full of redundancy. English text uses "e" 13% of the time and "z" 0.07% of the time — that statistical imbalance is compressible. Photos have spatial correlation — neighboring pixels tend to be similar. Audio has temporal correlation — the next sample is usually close to the current one. Compression algorithms are designed to find and exploit these patterns.
Entropy and the Limits of Compression
Shannon entropy measures the average information per symbol in bits. For a source with symbols s₁, s₂, ... sₙ occurring with probabilities p₁, p₂, ... pₙ, the entropy H = -Σ pᵢ log₂(pᵢ). A fair coin has entropy of 1 bit. A biased coin (90% heads) has entropy of 0.47 bits — it's more predictable, so it's more compressible.
This means: random data cannot be compressed. If every bit is equally likely to be 0 or 1, entropy is maximal and no algorithm can reduce the size. This is why encrypted files, already-compressed files, and random data don't get smaller when you ZIP them — they're already at or near maximum entropy.
Practical compressors never quite reach the Shannon limit. DEFLATE (used in PNG and ZIP) typically achieves 60-70% of the theoretical optimum on text. Modern codecs like Zstandard (2016, Facebook) and Brotli (2015, Google) get closer, especially at higher compression levels.
Lossless Compression: Every Bit Preserved
Lossless compression guarantees that decompressing the output produces a bit-perfect copy of the input. Not approximately the same — exactly the same, checksum-verifiable, legally identical. This is the only acceptable choice when data integrity matters: source code, financial records, medical images, master audio recordings, scientific datasets.
Dictionary Coding (LZ77, LZ78, LZW)
Abraham Lempel and Jacob Ziv published LZ77 in 1977 and LZ78 in 1978, founding the family of dictionary-based compressors. The idea: instead of encoding each byte individually, reference earlier occurrences. If the sequence "the quick brown fox" appeared 2000 bytes ago, store a pointer (offset=2000, length=19) instead of 19 bytes.
LZ77 uses a sliding window — it looks backward through a fixed-size buffer for matches. DEFLATE (1996, Phil Katz) combines LZ77 with Huffman coding and is the backbone of ZIP, gzip, and PNG compression. It's fast, well-understood, and universally supported.
LZW (Lempel-Ziv-Welch, 1984) builds an explicit dictionary of encountered sequences. It's the algorithm behind GIF compression and early Unix compress. The Unisys LZW patent controversy in the 1990s directly led to the creation of PNG as a patent-free alternative.
Modern variants: Zstandard (zstd) by Yann Collet at Facebook uses finite state entropy and larger search windows to achieve compression ratios approaching LZMA while decompressing 3-5x faster. Brotli, developed by Google, is optimized for web content and is mandatory for WOFF2 font compression.
Entropy Coding (Huffman, Arithmetic, ANS)
Entropy coding assigns shorter codes to more frequent symbols and longer codes to rare ones, approaching Shannon's entropy limit. It's almost always used as the final stage after dictionary coding or prediction.
Huffman coding (David Huffman, 1952) builds a binary tree where frequent symbols get short bit paths. It's simple and fast but can't assign fractional bits — a symbol occurring 33% of the time ideally needs 1.58 bits but Huffman rounds to 2. Used in JPEG (with predefined tables), DEFLATE, and MP3.
Arithmetic coding encodes the entire message as a single number between 0 and 1, with each symbol narrowing the interval based on its probability. It can achieve fractional-bit precision, getting closer to the entropy limit. Used in JPEG 2000, H.264/H.265 (CABAC), and FLAC.
ANS (Asymmetric Numeral Systems, Jarek Duda, 2009) achieves arithmetic coding's compression ratio with Huffman-like speed. It's used in Zstandard, Apple's LZFSE, and the JPEG XL reference encoder. ANS is arguably the most important compression innovation of the 2010s.
Lossless Formats in Practice
Typical lossless compression ratios vary enormously by content type:
| Content | Format | Typical Ratio | Why |
|---|---|---|---|
| English text | gzip | 3:1 to 4:1 | High redundancy in natural language |
| Source code | zstd | 4:1 to 6:1 | Repetitive keywords, indentation patterns |
| Photos (RGB) | PNG | 1.5:1 to 3:1 | Moderate spatial correlation, high entropy per pixel |
| Screenshots | PNG | 5:1 to 20:1 | Large uniform areas, limited color palette |
| CD audio (16-bit) | FLAC | 1.8:1 to 2.5:1 | Temporal correlation between samples |
| Database dumps | zstd | 5:1 to 15:1 | Highly structured, repetitive field names |
Key formats: PNG for images, FLAC/ALAC for audio, ZIP/7Z/TAR.GZ for archives, PDF/A for documents. Convert MP3 to FLAC won't magically restore lost audio data — it just wraps the lossy-decoded output in a lossless container.
Lossy Compression: Perceptually Transparent Destruction
Lossy compression achieves dramatically better compression ratios — 10:1, 50:1, even 200:1 — by permanently discarding information the human senses can't perceive. The key word is permanently. Once quantized, the original data is gone. You can decompress a JPEG, but you'll get the quantized approximation, not the original pixel values.
Psychoacoustic Coding (Audio)
The human ear has well-studied limitations that audio codecs exploit. Frequency masking: a loud tone at 1 kHz makes nearby frequencies (say, 1.1 kHz) inaudible — the codec can discard them entirely. Temporal masking: a loud sound makes softer sounds inaudible for 5-20ms afterward. Absolute threshold of hearing: below ~20 Hz and above ~18 kHz (less with age), humans can't hear anything, so that data can be zeroed out.
MP3 (MPEG-1 Layer III, developed at Fraunhofer IIS, standardized 1993) uses a psychoacoustic model to allocate bits across 32 frequency subbands via MDCT. At 128 kbps, it achieves roughly 11:1 compression from CD audio (1,411 kbps). At 320 kbps, most listeners can't distinguish MP3 from the original in ABX blind tests.
AAC (Advanced Audio Coding, 1997) improved on MP3 with full-bandwidth encoding (no 16 kHz cutoff), better stereo coding (parametric stereo), and more efficient entropy coding. At the same bitrate, AAC is audibly superior to MP3. It's the default for Apple Music, YouTube, and most streaming platforms.
Opus (IETF, 2012) combines SILK (for speech, from Skype) and CELT (for music) in a single codec that adapts between them. It beats both MP3 and AAC at every bitrate and is royalty-free. Opus at 128 kbps is generally transparent for music.
Convert between audio formats: MP3 to AAC | WAV to MP3 | FLAC to Opus | WAV to FLAC
Psychovisual Coding (Images and Video)
Human vision has its own exploitable limitations. Chroma subsampling: the eye resolves luminance (brightness) at much higher resolution than chrominance (color). JPEG, H.264, and most video codecs convert RGB to YCbCr and store color channels at half or quarter resolution (4:2:0 subsampling). This alone cuts raw data by 50% with near-zero visual impact.
Frequency sensitivity: we're better at seeing low-frequency gradients than high-frequency texture. The Discrete Cosine Transform (DCT) converts spatial pixel data into frequency components. Quantization then aggressively rounds high-frequency coefficients toward zero — discarding fine texture detail — while preserving low-frequency structure. This is where JPEG compression artifacts come from: the blocky patterns visible at low quality settings are the remnants of quantized 8x8 DCT blocks.
Motion estimation (video only): in video, most of the frame is the same as the previous frame, just shifted slightly. H.264 and H.265 use motion vectors to say "this 16x16 block moved 3 pixels right and 1 pixel down" instead of re-encoding the whole block. Inter-frame prediction is why H.264 achieves 200:1 compression ratios on typical video while looking sharp.
Image conversions: PNG to JPG | PNG to WebP | PNG to AVIF | BMP to JPG
Quantization: Where Quality Actually Dies
Quantization is the specific step in lossy compression where irreversible information loss occurs. It maps a continuous or high-precision value to a smaller set of discrete levels. Think of it as controlled rounding.
In JPEG, the quantization table determines how aggressively each DCT frequency coefficient gets rounded. A quality setting of 90 uses mild quantization — dividing by small numbers, preserving most coefficients. Quality 10 uses brutal quantization — dividing by large numbers, zeroing out most high-frequency data. The difference between quality 90 and quality 10 is not 9x; it can be 20-50x in file size, because aggressive quantization produces long runs of zeros that entropy coding eliminates efficiently.
In audio, quantization noise is shaped by the psychoacoustic model. Noise-shaping techniques push quantization error into frequency bands where it's masked by the signal itself. The result: the mathematical error is there, but it's hiding where you can't hear it.
The critical implication: quantization is not reversible. You cannot "enhance" a heavily compressed JPEG back to its original quality. The high-frequency data was divided away. AI upscalers can hallucinate plausible detail, but they're inventing information, not recovering it.
The Quality Cascade: Why Re-Encoding Destroys Files
Every lossy encoding pass applies its own quantization. If you open a JPEG, edit it, and save as JPEG again, the new encoder doesn't know what the old encoder already threw away. It applies fresh DCT, fresh quantization, and fresh rounding — destroying a new set of details on top of the already-degraded data.
This is called generation loss, and it's cumulative. A JPEG re-saved 10 times at quality 85 looks noticeably worse than one saved once at quality 85. A video transcoded from H.264 to H.264 loses quality even at the same bitrate, because the second encoder wastes bits re-encoding artifacts from the first pass as if they were real content.
The practical rules:
- Never transcode lossy-to-lossy unless you're changing format. Converting MP3 to AAC makes sense to gain compatibility, but you've added a second quantization pass. Keep the bitrate of the output higher than the input to minimize additional loss.
- Edit in lossless, export to lossy once. Edit photos in TIFF/PSD/RAW, export the final as JPEG. Edit audio in WAV/FLAC, export as MP3/AAC once. Never use a lossy format as your working format.
- Keep the original. Even if you distribute JPEG, store the PNG/TIFF original. You can always re-compress from the original; you can't un-compress a lossy file.
Lossy vs Lossless: Complete Comparison
| Dimension | Lossless | Lossy |
|---|---|---|
| Data preservation | Bit-perfect reconstruction | Approximate reconstruction |
| Typical ratio (photos) | 2:1 to 3:1 (PNG) | 10:1 to 50:1 (JPEG) |
| Typical ratio (audio) | 2:1 (FLAC) | 10:1 to 15:1 (MP3 128k) |
| Re-encoding safety | No degradation, ever | Cumulative quality loss |
| Use case | Archival, editing, source material | Distribution, streaming, sharing |
| Speed (encode) | Generally faster | Slower (perceptual modeling) |
| Speed (decode) | Variable | Often hardware-accelerated |
| File size predictability | Content-dependent | Target bitrate/quality gives consistent size |
When to Use Lossy vs Lossless
Use lossless when:
- The file is a source/master that may be edited later (PSD, WAV, TIFF, RAW)
- Exact reproduction matters legally or scientifically (medical imaging, legal documents, financial data)
- The content is already small or low-entropy (logos, diagrams, spreadsheets, source code)
- You're archiving for long-term preservation (convert photos to PNG or TIFF for archival)
- You're compressing non-media data (archives: ZIP to 7Z for better compression)
Use lossy when:
- The file is for distribution, not editing (web images, streaming audio/video)
- File size is constrained (email attachments, mobile bandwidth, storage costs)
- Human perception is the quality bar (if no one can see/hear the difference, the bits were wasted)
- You're compressing photos, music, or video at reasonable quality settings
- Real-time performance matters (video calls, streaming, game assets)
Hybrid approaches exist: WebP, AVIF, and JPEG XL support both lossy and lossless modes in a single format. Convert PNG to WebP lossless for web delivery without quality loss, or PNG to AVIF lossy for maximum compression.
Formats That Support Both Modes
Several modern formats blur the line by offering both lossy and lossless compression in the same container:
| Format | Type | Lossy Codec | Lossless Codec | Notes |
|---|---|---|---|---|
| WebP | Image | VP8 | VP8L | Completely separate codecs sharing a container |
| AVIF | Image | AV1 | AV1 | Same codec, different config; lossless mode is slower |
| JPEG XL | Image | VarDCT | Modular | Can losslessly transcode existing JPEG files |
| HEIC/HEIF | Image | HEVC | HEVC | Apple's default; patent-encumbered |
| FLAC | Audio | — | Linear prediction + Rice | Lossless only |
| ALAC | Audio | — | Linear prediction | Apple's lossless; open-sourced 2011 |
| TIFF | Image | JPEG (inside TIFF) | LZW, ZIP, None | Container supports both, lossless is standard |
Lossy and lossless aren't competing approaches — they solve different problems. Lossy compression is one of the greatest engineering achievements of the 20th century, making streaming video, digital music, and web images practical within real bandwidth constraints. Lossless compression ensures that the data we choose to preserve stays bit-perfect across decades and transfers.
The practical takeaway: always start from the highest-quality source available, keep that source in a lossless format, and compress to lossy formats only at the final distribution step. If you follow that one rule, you'll never be the person re-compressing a JPEG for the fifth time and wondering why it looks like a watercolor painting.