Every second of raw 1080p video at 24fps produces roughly 150MB of data. A two-hour film would be 1.08TB uncompressed. Video compression is what makes streaming, downloads, and storage remotely practical — a well-compressed version of that same film fits in 2-4GB.
But compression is not one thing. There's a stack of decisions: which codec encodes the pixels, which container wraps the streams, what bitrate balances quality against size, and whether your conversion actually re-encodes the data or just moves it between containers. Get these wrong and you either bloat your files or destroy your quality for nothing.
This guide explains how it all works, with real numbers and practical advice for anyone converting video files.
Containers vs Codecs: The Distinction Most People Miss
A codec (coder-decoder) is the algorithm that compresses and decompresses the actual video and audio data. H.264, H.265, VP9, and AV1 are codecs.
A container is the file format that packages one or more codec streams together — video, audio, subtitles, chapter markers — into a single file. MP4, MKV, WebM, and AVI are containers.
This matters because the same video data can live in different containers. An H.264 video with AAC audio can be packaged as an .mp4, an .mkv, or a .mov file. The actual compressed data inside is identical. The container just determines how it's organized on disk and what metadata it supports.
Here's why this is practical: converting MKV to MP4 usually means remuxing — moving the streams from one container to another without touching the compressed data. That's fast (seconds, not minutes) and completely lossless. But converting AVI to MP4 often requires re-encoding because AVI files frequently contain older codecs (DivX, Xvid) that MP4 doesn't natively support.
Container Format Comparison
| Container | Extension | Video Codecs | Audio Codecs | Best For |
|---|---|---|---|---|
| MP4 | .mp4, .m4v | H.264, H.265, AV1 | AAC, MP3, AC3 | Universal playback, sharing, streaming |
| MKV | .mkv | Virtually all codecs | Virtually all codecs | Archival, multiple audio/subtitle tracks |
| WebM | .webm | VP8, VP9, AV1 | Vorbis, Opus | Web embedding, HTML5 video |
| AVI | .avi | DivX, Xvid, H.264 | MP3, PCM, AC3 | Legacy compatibility (mostly outdated) |
| MOV | .mov | H.264, H.265, ProRes | AAC, PCM, ALAC | Apple ecosystem, professional editing |
| FLV | .flv | H.264, VP6 | AAC, MP3 | Legacy Flash video (obsolete) |
How Video Compression Actually Works
Video compression exploits two types of redundancy: what's similar within a single frame, and what's similar between consecutive frames.
Intra-Frame Compression (Spatial)
Each frame is divided into blocks (typically 16x16 or 8x8 pixels). The encoder predicts each block from its neighbors — a blue sky area is mostly the same color, so instead of storing every pixel, it stores "same as the block above, plus this tiny difference." This is conceptually similar to how JPEG compresses a photo. The result is an I-frame (intra-coded frame) that can be decoded on its own.
Inter-Frame Compression (Temporal)
This is where video compression gets its real power. In most video, 90%+ of pixels don't change between adjacent frames. Instead of re-encoding the entire image, the encoder stores only what changed.
P-frames (predicted frames) reference a previous frame and store only the differences — motion vectors plus residual data. A person walking across a static background means the background blocks are "copy from frame N-1" while only the moving person generates new data.
B-frames (bidirectional frames) reference both past and future frames, achieving even better compression. They're computationally expensive to encode but dramatically reduce file size. A typical encoding pattern (GOP — Group of Pictures) looks like: I B B P B B P B B P B B I
The ratio matters: more B-frames = smaller files but slower encoding and seeking. Most H.264 encoders default to 3-5 B-frames between reference frames.
Video Codecs Ranked: H.264, H.265, VP9, AV1
Each generation of codecs achieves roughly 30-50% better compression than the last, at the cost of more encoding time.
| Codec | Released | Compression vs H.264 | Encode Speed | Decode Support | Licensing |
|---|---|---|---|---|---|
| H.264 (AVC) | 2003 | Baseline | Fast | Everything | Licensed (MPEG LA), free for end users |
| H.265 (HEVC) | 2013 | ~50% smaller | 2-5x slower | Most modern devices | Patent mess (3 pools + independents) |
| VP9 | 2013 | ~35-45% smaller | 3-5x slower | Chrome, Firefox, Android, YouTube | Royalty-free (Google) |
| AV1 | 2018 | ~50-60% smaller | 10-20x slower | Growing (Chrome, Firefox, new hardware) | Royalty-free (Alliance for Open Media) |
H.264 is still the right default for almost everything. It plays on every device, every browser, every TV, every phone manufactured in the last 15 years. Encoding is fast. Quality is good. If you're converting video and don't know what codec to pick: H.264 in an MP4 container. That's the answer for 90% of use cases.
H.265/HEVC delivers genuinely better compression — a 1080p file at equivalent quality is roughly half the size. The problem is licensing. Three separate patent pools plus independent licensors make it a legal minefield for software developers. Apple and Samsung devices support it natively. Firefox still doesn't decode it without OS support.
VP9 is YouTube's workhorse codec. Every video you watch on YouTube above 720p is likely VP9. Google developed it as a royalty-free HEVC alternative. Browser support is excellent in Chrome and Firefox, weaker on Safari/iOS.
AV1 is the future — technically superior to everything above. Netflix, YouTube, and Meta are migrating to it. The catch: encoding a single video in AV1 at high quality can take 10-20x longer than H.264. Hardware AV1 encoders in newer GPUs (NVIDIA RTX 4000+, Intel Arc) are closing this gap fast. If you're encoding once and serving millions of times, AV1 is worth the wait.
Bitrate: The Quality-Size Dial
Bitrate is the amount of data used per second of video, measured in Mbps (megabits per second) or kbps. Higher bitrate = more data = better quality = larger file. It's the single most important quality control you have.
Practical Bitrate Benchmarks (H.264)
| Resolution | Low Quality | Good Quality | High Quality | Overkill |
|---|---|---|---|---|
| 720p | 1 Mbps | 2.5 Mbps | 5 Mbps | 8+ Mbps |
| 1080p | 2 Mbps | 5 Mbps | 8 Mbps | 15+ Mbps |
| 4K | 10 Mbps | 20 Mbps | 35 Mbps | 60+ Mbps |
YouTube streams 1080p at roughly 4-8 Mbps (VP9). Netflix streams 1080p at 3-6 Mbps (H.264/HEVC). Blu-ray discs use 20-40 Mbps. These numbers give you a reality check when choosing bitrates for conversion.
CBR vs VBR
CBR (Constant Bitrate) uses the same data rate for every second — dark static scenes and fast action sequences get the same budget. Predictable file sizes, but wastes bits on simple scenes and starves complex ones.
VBR (Variable Bitrate) allocates more bits to complex scenes and fewer to simple ones. A talking head on a plain background might use 1 Mbps while an explosion scene gets 15 Mbps. VBR produces better quality at the same average file size. Most modern encoders default to VBR, and you should too.
CRF (Constant Rate Factor) is VBR's practical cousin. Instead of targeting a bitrate, you target a quality level (typically 18-28 for H.264). The encoder uses however many bits are needed to hit that quality. CRF 18 is visually lossless. CRF 23 is the FFmpeg default. CRF 28 is noticeable but acceptable for casual viewing. This is the mode most desktop encoders use.
Resolution vs Quality: Why 1080p at 2Mbps Looks Worse Than 720p at 3Mbps
Resolution (1920x1080, 1280x720) defines how many pixels are in each frame. Quality depends on how many bits each pixel gets. Double the resolution means quadrupling the pixel count — 1080p has 2.25x more pixels than 720p.
If you keep the bitrate the same but increase resolution, each pixel gets fewer bits. The result: more pixels, but each one looks worse. Blocky, smeary 1080p is worse than clean, sharp 720p. This is why YouTube's 720p stream at 2.5 Mbps often looks better than a poorly encoded 1080p file at the same bitrate.
The practical lesson: don't upscale. Converting a 720p source to 1080p adds pixels but no detail. You're just stretching the same image across more pixels, making the file bigger without improving quality. When reducing file size, lowering resolution combined with a reasonable bitrate often produces the best results.
Audio Codecs in Video Files
Audio quality in video files gets overlooked, but bad audio is more distracting than bad video. The audio track typically accounts for only 5-15% of total file size, so skimping on it saves almost nothing while ruining the experience.
AAC is the standard audio codec in MP4 files. At 128-256 kbps, it sounds excellent. It's the default for most video production and streaming.
Opus is technically superior to AAC at every bitrate — especially below 64 kbps where it's dramatically better. It's the default in WebM files and used by Discord, Zoom, and most VoIP. Less universally supported than AAC in hardware players.
MP3 appears in older AVI and some MP4 files. At 192+ kbps it's perfectly fine. Below 128 kbps it sounds noticeably worse than AAC or Opus at the same bitrate.
AC3 (Dolby Digital) is common in MKV files ripped from Blu-rays. Supports 5.1 surround sound. If you convert an MKV with AC3 audio to MP4, the audio will typically be re-encoded to AAC stereo, losing surround channels.
When extracting audio from a video, remember that re-encoding always introduces some generation loss. If the source audio is AAC at 128 kbps and you convert to MP3 at 128 kbps, you'll lose quality. Either keep the original codec or encode at a higher bitrate than the source.
Remuxing vs Re-encoding: When Conversion Loses Quality
Remuxing moves the compressed streams from one container to another without decompressing or recompressing anything. The data is bit-for-bit identical. This is what happens when you convert MKV to MP4 and both files contain H.264 + AAC. It takes seconds and produces zero quality loss.
Re-encoding decompresses the video, then recompresses it with a (possibly different) codec. This always loses quality because lossy compression is not reversible — you're compressing already-compressed data. Each re-encode compounds the loss (generation loss). A video re-encoded 5 times looks noticeably degraded.
Rules of thumb:
- Container change, same codecs → Remux (lossless, fast). Example: MKV→MP4 with H.264+AAC.
- Codec change required → Re-encode (lossy, slow). Example: AVI (DivX) → MP4 (H.264).
- Resolution or bitrate change → Re-encode (lossy, slow). Example: 4K → 1080p.
- Any quality adjustment → Re-encode. There's no way to "improve" quality by converting — you can only maintain or reduce it.
Always keep your original file as the master. Never re-encode from a re-encoded copy if you can avoid it.
Estimating File Sizes
Quick formula: File size (MB) = Bitrate (Mbps) x Duration (seconds) / 8
Practical examples for H.264 at good quality:
- 1 hour of 1080p at 5 Mbps = 5 x 3600 / 8 = 2,250 MB (2.2 GB)
- 1 hour of 720p at 2.5 Mbps = 2.5 x 3600 / 8 = 1,125 MB (1.1 GB)
- 1 hour of 4K at 20 Mbps = 20 x 3600 / 8 = 9,000 MB (8.8 GB)
- 5 min clip, 1080p at 8 Mbps = 8 x 300 / 8 = 300 MB
Add 5-15% for the audio track. These are rough numbers — VBR encoding varies significantly based on content complexity. Action movies with fast cuts use more data than static talking-head videos.
Video compression boils down to three decisions: which codec, which container, and how much bitrate. For most people, H.264 in MP4 at CRF 18-23 is the right answer. If you're converting between containers with compatible codecs, remuxing gives you zero quality loss. If you need to re-encode, use the highest quality your file size budget allows and avoid re-encoding the same file multiple times.
Ready to convert? Convert MKV to MP4, Convert AVI to MP4, or Convert MP4 to WebM — all free, no signup required.