General

Lossy vs Lossless: The Complete Guide to Data Compression

Published Mar 19, 2026 9 min read By ChangeThisFile Team

Quick Answer

Lossy compression permanently discards data the human senses are unlikely to notice, producing dramatically smaller files at the cost of irreversible quality loss. Lossless compression reduces file size by finding and encoding patterns more efficiently, preserving every bit of the original. The right choice depends on whether you need a bit-perfect copy or a perceptually identical one.

Quick Verdict

Best for Archival

Lossless

Bit-perfect reconstruction guarantees the original data survives indefinitely — the only choice for source material and legal records

Best for Distribution

Lossy

10:1 to 200:1 compression ratios make streaming, web delivery, and file sharing practical within real bandwidth constraints

Best for Editing

Lossless

No generation loss through multiple edit passes — edit in WAV/TIFF/PNG, export to lossy formats only at the final step

Most Efficient

Lossy

Psychoacoustic and psychovisual models discard only data humans cannot perceive — perceptually transparent at fraction of the size

Benchmarks

Test data: 12-megapixel photograph and 3-minute CD-quality stereo music track

format	photo Size	audio Size	quality Loss	re Encoding Safe
PNG (lossless image)	24.8 MB	N/A	None	Yes
JPG 85% (lossy image)	2.8 MB	N/A	Imperceptible at 85%	No (generation loss)
FLAC (lossless audio)	N/A	18.2 MB	None	Yes
MP3 192kbps (lossy audio)	N/A	4.3 MB	Transparent to most listeners	No (generation loss)

Every file you've ever compressed — whether a JPEG photo, an MP3 song, or a ZIP archive — used one of two fundamentally different strategies: throw away data the recipient won't notice (lossy), or find a cleverer way to describe the same data using fewer bits (lossless). These aren't just implementation details. They determine whether your file can survive a round-trip, whether re-encoding destroys quality, and whether your archive will be bit-identical in 50 years.

This guide covers the theory and practice of both approaches. We'll start with information theory — the mathematical foundation that makes compression possible at all — then walk through how lossy and lossless codecs actually work, with real examples from image, audio, and video formats. If you've ever wondered why a JPEG gets blurry when you re-save it, or why a FLAC file is half the size of a WAV, this is the explanation.

Understanding compression isn't just academic. It directly affects which format you choose for every file you create, share, or archive. Make the wrong call and you either waste storage on unnecessarily large files or destroy quality you can't get back.

Information Theory: Why Compression Works at All

In 1948, Claude Shannon published "A Mathematical Theory of Communication" at Bell Labs and created the field of information theory. The core insight: data contains entropy (irreducible information) and redundancy (predictable patterns). Compression exploits redundancy.

Consider the string "AAAAAABBBBCCCC" — 14 characters, but only three unique symbols with obvious runs. Run-length encoding compresses this to "6A4B4C" — 6 characters representing the same information. Shannon showed that every data source has a theoretical minimum description length (the entropy rate), and no lossless compressor can beat it.

Real-world data is full of redundancy. English text uses "e" 13% of the time and "z" 0.07% of the time — that statistical imbalance is compressible. Photos have spatial correlation — neighboring pixels tend to be similar. Audio has temporal correlation — the next sample is usually close to the current one. Compression algorithms are designed to find and exploit these patterns.

Entropy and the Limits of Compression

Shannon entropy measures the average information per symbol in bits. For a source with symbols s₁, s₂, ... sₙ occurring with probabilities p₁, p₂, ... pₙ, the entropy H = -Σ pᵢ log₂(pᵢ). A fair coin has entropy of 1 bit. A biased coin (90% heads) has entropy of 0.47 bits — it's more predictable, so it's more compressible.

This means: random data cannot be compressed. If every bit is equally likely to be 0 or 1, entropy is maximal and no algorithm can reduce the size. This is why encrypted files, already-compressed files, and random data don't get smaller when you ZIP them — they're already at or near maximum entropy.

Practical compressors never quite reach the Shannon limit. DEFLATE (used in PNG and ZIP) typically achieves 60-70% of the theoretical optimum on text. Modern codecs like Zstandard (2016, Facebook) and Brotli (2015, Google) get closer, especially at higher compression levels.

Lossless Compression: Every Bit Preserved

Lossless compression guarantees that decompressing the output produces a bit-perfect copy of the input. Not approximately the same — exactly the same, checksum-verifiable, legally identical. This is the only acceptable choice when data integrity matters: source code, financial records, medical images, master audio recordings, scientific datasets.

Dictionary Coding (LZ77, LZ78, LZW)

Abraham Lempel and Jacob Ziv published LZ77 in 1977 and LZ78 in 1978, founding the family of dictionary-based compressors. The idea: instead of encoding each byte individually, reference earlier occurrences. If the sequence "the quick brown fox" appeared 2000 bytes ago, store a pointer (offset=2000, length=19) instead of 19 bytes.

LZ77 uses a sliding window — it looks backward through a fixed-size buffer for matches. DEFLATE (1996, Phil Katz) combines LZ77 with Huffman coding and is the backbone of ZIP, gzip, and PNG compression. It's fast, well-understood, and universally supported.

LZW (Lempel-Ziv-Welch, 1984) builds an explicit dictionary of encountered sequences. It's the algorithm behind GIF compression and early Unix compress. The Unisys LZW patent controversy in the 1990s directly led to the creation of PNG as a patent-free alternative.

Modern variants: Zstandard (zstd) by Yann Collet at Facebook uses finite state entropy and larger search windows to achieve compression ratios approaching LZMA while decompressing 3-5x faster. Brotli, developed by Google, is optimized for web content and is mandatory for WOFF2 font compression.

Entropy Coding (Huffman, Arithmetic, ANS)

Entropy coding assigns shorter codes to more frequent symbols and longer codes to rare ones, approaching Shannon's entropy limit. It's almost always used as the final stage after dictionary coding or prediction.

Huffman coding (David Huffman, 1952) builds a binary tree where frequent symbols get short bit paths. It's simple and fast but can't assign fractional bits — a symbol occurring 33% of the time ideally needs 1.58 bits but Huffman rounds to 2. Used in JPEG (with predefined tables), DEFLATE, and MP3.

Arithmetic coding encodes the entire message as a single number between 0 and 1, with each symbol narrowing the interval based on its probability. It can achieve fractional-bit precision, getting closer to the entropy limit. Used in JPEG 2000, H.264/H.265 (CABAC), and FLAC.

ANS (Asymmetric Numeral Systems, Jarek Duda, 2009) achieves arithmetic coding's compression ratio with Huffman-like speed. It's used in Zstandard, Apple's LZFSE, and the JPEG XL reference encoder. ANS is arguably the most important compression innovation of the 2010s.

Lossless Formats in Practice

Typical lossless compression ratios vary enormously by content type:

Content	Format	Typical Ratio	Why
English text	gzip	3:1 to 4:1	High redundancy in natural language
Source code	zstd	4:1 to 6:1	Repetitive keywords, indentation patterns
Photos (RGB)	PNG	1.5:1 to 3:1	Moderate spatial correlation, high entropy per pixel
Screenshots	PNG	5:1 to 20:1	Large uniform areas, limited color palette
CD audio (16-bit)	FLAC	1.8:1 to 2.5:1	Temporal correlation between samples
Database dumps	zstd	5:1 to 15:1	Highly structured, repetitive field names

Key formats: PNG for images, FLAC/ALAC for audio, ZIP/7Z/TAR.GZ for archives, PDF/A for documents. Convert MP3 to FLAC won't magically restore lost audio data — it just wraps the lossy-decoded output in a lossless container.

Lossy Compression: Perceptually Transparent Destruction

Lossy compression achieves dramatically better compression ratios — 10:1, 50:1, even 200:1 — by permanently discarding information the human senses can't perceive. The key word is permanently. Once quantized, the original data is gone. You can decompress a JPEG, but you'll get the quantized approximation, not the original pixel values.

Psychoacoustic Coding (Audio)

The human ear has well-studied limitations that audio codecs exploit. Frequency masking: a loud tone at 1 kHz makes nearby frequencies (say, 1.1 kHz) inaudible — the codec can discard them entirely. Temporal masking: a loud sound makes softer sounds inaudible for 5-20ms afterward. Absolute threshold of hearing: below ~20 Hz and above ~18 kHz (less with age), humans can't hear anything, so that data can be zeroed out.

MP3 (MPEG-1 Layer III, developed at Fraunhofer IIS, standardized 1993) uses a psychoacoustic model to allocate bits across 32 frequency subbands via MDCT. At 128 kbps, it achieves roughly 11:1 compression from CD audio (1,411 kbps). At 320 kbps, most listeners can't distinguish MP3 from the original in ABX blind tests.

AAC (Advanced Audio Coding, 1997) improved on MP3 with full-bandwidth encoding (no 16 kHz cutoff), better stereo coding (parametric stereo), and more efficient entropy coding. At the same bitrate, AAC is audibly superior to MP3. It's the default for Apple Music, YouTube, and most streaming platforms.

Opus (IETF, 2012) combines SILK (for speech, from Skype) and CELT (for music) in a single codec that adapts between them. It beats both MP3 and AAC at every bitrate and is royalty-free. Opus at 128 kbps is generally transparent for music.

Convert between audio formats: MP3 to AAC | WAV to MP3 | FLAC to Opus | WAV to FLAC

Psychovisual Coding (Images and Video)

Human vision has its own exploitable limitations. Chroma subsampling: the eye resolves luminance (brightness) at much higher resolution than chrominance (color). JPEG, H.264, and most video codecs convert RGB to YCbCr and store color channels at half or quarter resolution (4:2:0 subsampling). This alone cuts raw data by 50% with near-zero visual impact.

Frequency sensitivity: we're better at seeing low-frequency gradients than high-frequency texture. The Discrete Cosine Transform (DCT) converts spatial pixel data into frequency components. Quantization then aggressively rounds high-frequency coefficients toward zero — discarding fine texture detail — while preserving low-frequency structure. This is where JPEG compression artifacts come from: the blocky patterns visible at low quality settings are the remnants of quantized 8x8 DCT blocks.

Motion estimation (video only): in video, most of the frame is the same as the previous frame, just shifted slightly. H.264 and H.265 use motion vectors to say "this 16x16 block moved 3 pixels right and 1 pixel down" instead of re-encoding the whole block. Inter-frame prediction is why H.264 achieves 200:1 compression ratios on typical video while looking sharp.

Image conversions: PNG to JPG | PNG to WebP | PNG to AVIF | BMP to JPG

Quantization: Where Quality Actually Dies

Quantization is the specific step in lossy compression where irreversible information loss occurs. It maps a continuous or high-precision value to a smaller set of discrete levels. Think of it as controlled rounding.

In JPEG, the quantization table determines how aggressively each DCT frequency coefficient gets rounded. A quality setting of 90 uses mild quantization — dividing by small numbers, preserving most coefficients. Quality 10 uses brutal quantization — dividing by large numbers, zeroing out most high-frequency data. The difference between quality 90 and quality 10 is not 9x; it can be 20-50x in file size, because aggressive quantization produces long runs of zeros that entropy coding eliminates efficiently.

In audio, quantization noise is shaped by the psychoacoustic model. Noise-shaping techniques push quantization error into frequency bands where it's masked by the signal itself. The result: the mathematical error is there, but it's hiding where you can't hear it.

The critical implication: quantization is not reversible. You cannot "enhance" a heavily compressed JPEG back to its original quality. The high-frequency data was divided away. AI upscalers can hallucinate plausible detail, but they're inventing information, not recovering it.

The Quality Cascade: Why Re-Encoding Destroys Files

Every lossy encoding pass applies its own quantization. If you open a JPEG, edit it, and save as JPEG again, the new encoder doesn't know what the old encoder already threw away. It applies fresh DCT, fresh quantization, and fresh rounding — destroying a new set of details on top of the already-degraded data.

This is called generation loss, and it's cumulative. A JPEG re-saved 10 times at quality 85 looks noticeably worse than one saved once at quality 85. A video transcoded from H.264 to H.264 loses quality even at the same bitrate, because the second encoder wastes bits re-encoding artifacts from the first pass as if they were real content.

The practical rules:

Never transcode lossy-to-lossy unless you're changing format. Converting MP3 to AAC makes sense to gain compatibility, but you've added a second quantization pass. Keep the bitrate of the output higher than the input to minimize additional loss.
Edit in lossless, export to lossy once. Edit photos in TIFF/PSD/RAW, export the final as JPEG. Edit audio in WAV/FLAC, export as MP3/AAC once. Never use a lossy format as your working format.
Keep the original. Even if you distribute JPEG, store the PNG/TIFF original. You can always re-compress from the original; you can't un-compress a lossy file.

Lossy vs Lossless: Complete Comparison

Dimension	Lossless	Lossy
Data preservation	Bit-perfect reconstruction	Approximate reconstruction
Typical ratio (photos)	2:1 to 3:1 (PNG)	10:1 to 50:1 (JPEG)
Typical ratio (audio)	2:1 (FLAC)	10:1 to 15:1 (MP3 128k)
Re-encoding safety	No degradation, ever	Cumulative quality loss
Use case	Archival, editing, source material	Distribution, streaming, sharing
Speed (encode)	Generally faster	Slower (perceptual modeling)
Speed (decode)	Variable	Often hardware-accelerated
File size predictability	Content-dependent	Target bitrate/quality gives consistent size

When to Use Lossy vs Lossless

Use lossless when:

The file is a source/master that may be edited later (PSD, WAV, TIFF, RAW)
Exact reproduction matters legally or scientifically (medical imaging, legal documents, financial data)
The content is already small or low-entropy (logos, diagrams, spreadsheets, source code)
You're archiving for long-term preservation (convert photos to PNG or TIFF for archival)
You're compressing non-media data (archives: ZIP to 7Z for better compression)

Use lossy when:

The file is for distribution, not editing (web images, streaming audio/video)
File size is constrained (email attachments, mobile bandwidth, storage costs)
Human perception is the quality bar (if no one can see/hear the difference, the bits were wasted)
You're compressing photos, music, or video at reasonable quality settings
Real-time performance matters (video calls, streaming, game assets)

Hybrid approaches exist: WebP, AVIF, and JPEG XL support both lossy and lossless modes in a single format. Convert PNG to WebP lossless for web delivery without quality loss, or PNG to AVIF lossy for maximum compression.

Formats That Support Both Modes

Several modern formats blur the line by offering both lossy and lossless compression in the same container:

Format	Type	Lossy Codec	Lossless Codec	Notes
WebP	Image	VP8	VP8L	Completely separate codecs sharing a container
AVIF	Image	AV1	AV1	Same codec, different config; lossless mode is slower
JPEG XL	Image	VarDCT	Modular	Can losslessly transcode existing JPEG files
HEIC/HEIF	Image	HEVC	HEVC	Apple's default; patent-encumbered
FLAC	Audio	—	Linear prediction + Rice	Lossless only
ALAC	Audio	—	Linear prediction	Apple's lossless; open-sourced 2011
TIFF	Image	JPEG (inside TIFF)	LZW, ZIP, None	Container supports both, lossless is standard

Lossy and lossless aren't competing approaches — they solve different problems. Lossy compression is one of the greatest engineering achievements of the 20th century, making streaming video, digital music, and web images practical within real bandwidth constraints. Lossless compression ensures that the data we choose to preserve stays bit-perfect across decades and transfers.

The practical takeaway: always start from the highest-quality source available, keep that source in a lossless format, and compress to lossy formats only at the final distribution step. If you follow that one rule, you'll never be the person re-compressing a JPEG for the fifth time and wondering why it looks like a watercolor painting.

Key Takeaways

Lossy compression is irreversible — once quantized, original data is gone forever
Lossless compression preserves every bit and has a theoretical minimum (Shannon entropy)
Re-encoding lossy files causes cumulative generation loss; always keep the original
Modern formats like WebP, AVIF, and JPEG XL offer both modes in one container
Lossy is for distribution, lossless is for archival and editing — use the right one for the job
Psychoacoustic and psychovisual models exploit human perceptual limits, not data redundancy
Random or already-compressed data cannot be further compressed

Frequently Asked Questions

Can I convert a lossy file to lossless to restore quality?

No. Converting MP3 to FLAC or JPEG to PNG wraps the already-degraded data in a lossless container. The quality lost during the original lossy encoding is permanently gone. The lossless file will be larger than the lossy one while containing the same (reduced) quality. It's like photocopying a photocopy and putting it in a nice frame — the frame doesn't restore the original.

Why does re-saving a JPEG make it blurry?

Each save applies a new round of DCT and quantization. The first save discards high-frequency detail. The second save treats the artifacts from the first save as real content and quantizes again, introducing new artifacts. After several generations, the image visibly degrades with increased blockiness and color banding. Always edit in a lossless format (PSD, TIFF, PNG) and export to JPEG only once at the end.

Is lossless compression always better than lossy?

Not at all. Lossless compression typically achieves only 2:1 to 3:1 ratios on photographic images, while JPEG at quality 85 achieves 10:1 to 20:1 with minimal visible difference. For web delivery, where bandwidth matters, lossless is wasteful. For archival and editing, where quality matters, lossless is mandatory. They're different tools for different jobs.

What's the best lossy format for images in 2026?

AVIF offers the best compression ratio with broad browser support (Chrome, Firefox, Safari since 16.4). WebP is the safe choice with near-universal support since 2020. JPEG XL has superior features (lossless JPEG recompression, progressive decoding) but lost Chrome support in 2023. For maximum compatibility, JPEG remains universally supported. Use AVIF for modern web delivery where supported, with WebP or JPEG as fallback.

Does compressing a ZIP file again make it smaller?

Almost never. ZIP uses DEFLATE, which removes most redundancy in the first pass. Compressing the output again yields at best a 1-2% reduction (from compressing ZIP's own headers and metadata) and often makes the file slightly larger due to overhead. The same applies to compressing JPEG, MP3, FLAC, or any already-compressed format — there's no redundancy left to exploit.

What happens to metadata during lossy compression?

Metadata (EXIF, ID3, XMP) is stored separately from the compressed content data and passes through losslessly — it's not affected by lossy compression of the pixel or audio data. However, some tools strip metadata during conversion as a side effect. If preserving metadata matters, verify the output contains it. When converting between formats, metadata may need to be mapped to a different standard (e.g., EXIF in JPEG to XMP in PNG).

Why are video files so much smaller than the raw frames would suggest?

A 1080p video at 30fps has 30 full frames per second, each roughly 6MB uncompressed — that's 180MB per second or 10.8GB per minute. H.264 compresses this to roughly 5-10MB per minute (500:1 to 1000:1) using inter-frame prediction (most of each frame is predicted from neighboring frames), intra-frame DCT compression, and sophisticated motion estimation. Only the differences between frames need to be encoded.

Is FLAC truly lossless, or does it lose some quality?

FLAC is mathematically lossless — the decoded output is bit-for-bit identical to the input. You can verify this by decoding a FLAC file to WAV and comparing checksums with the original WAV. FLAC achieves roughly 50-60% compression on typical music by predicting each audio sample from its neighbors and entropy-coding the prediction residuals. Zero information is discarded.

Compare Formats

LOSSLESS-DEEP-DIVE vs LOSSY

Ready to convert your files?

Use ChangeThisFile to convert between 600+ formats — free, fast, and private.

Start Converting