Video

Video Codecs Explained: From H.264 to AV1

Published Mar 19, 2026 8 min read By ChangeThisFile Team

Quick Answer

A video codec compresses raw video frames into a manageable file size by exploiting redundancy within and between frames. The codec (H.264, H.265, VP9, AV1) handles compression; the container (MP4, MKV, WebM) packages the compressed streams. Understanding frame types, profiles, encoding modes, and the hardware vs software tradeoff lets you choose the right settings for any conversion task.

Raw, uncompressed 1080p video at 30fps generates about 186MB per second. That's 11GB per minute, or 670GB per hour. Without compression, a single Blu-ray movie would need 3-4 terabytes of storage. Video codecs solve this by reducing that data by 100-1000x while preserving enough visual quality that your eyes can't tell the difference.

But codecs aren't magic boxes. They make specific engineering tradeoffs: newer codecs compress harder but need more CPU time to encode. Different profiles enable different features at different complexity levels. Hardware encoders sacrifice compression efficiency for real-time speed. Understanding these tradeoffs is what lets you choose the right codec and settings for your specific use case, rather than guessing.

This guide explains how video codecs actually work, from the fundamental compression techniques to the practical encoding settings you'll use when converting files.

Codec vs Container: The First Distinction

A codec (coder-decoder) is an algorithm that compresses video frames into a fraction of their original size (encoding) and reconstructs them for playback (decoding). H.264, H.265, VP9, and AV1 are codecs.

A container (or wrapper) is a file format that packages one or more compressed streams (video, audio, subtitles) into a single file with synchronization and metadata. MP4, MKV, and WebM are containers.

The same codec can live in different containers. H.264 video can be in an MP4 file, an MKV file, or a MOV file — the compressed video data is identical in each case. The container only determines how the streams are organized on disk, what metadata is stored, and what additional tracks (subtitles, chapters) are supported.

This is why MKV-to-MP4 conversion is often instant and lossless: if both containers support the same codecs, you're just moving data from one wrapper to another (remuxing) without touching the compressed content.

How Video Compression Works

Every video codec exploits two types of redundancy to shrink file size:

Spatial Compression (Within a Frame)

A single video frame has enormous redundancy. A blue sky region is mostly the same color. A wooden desk has repeating grain patterns. Instead of storing every pixel individually, the codec divides the frame into blocks and predicts each block's content from its neighbors.

The process for a typical codec:

Block partitioning: Divide the frame into blocks. H.264 uses 16x16 macroblocks (optionally split to 4x4). H.265 uses Coding Tree Units (CTUs) up to 64x64, recursively split. AV1 uses superblocks up to 128x128.
Prediction: For each block, predict its content from adjacent decoded blocks (intra prediction). The codec tries multiple prediction modes (horizontal, vertical, diagonal, DC/flat) and picks the one with the smallest residual (difference between prediction and actual).
Transform: Apply a DCT (Discrete Cosine Transform) or similar transform to the residual, converting spatial data to frequency data. Low-frequency components (gradual changes) get large coefficients; high-frequency components (sharp edges, fine detail) get small ones.
Quantization: This is the lossy step. Divide the transform coefficients by a quantization parameter (QP). Small coefficients round to zero and are discarded. Higher QP = more zeros = smaller file = more quality loss.
Entropy coding: Compress the quantized coefficients using arithmetic coding (CABAC in H.264/H.265, ANS in AV1). This is lossless — it just represents the data more efficiently.

Temporal Compression (Between Frames)

Consecutive video frames are usually very similar — most of the scene doesn't change between frames. Temporal compression exploits this by storing only the differences between frames.

The codec performs motion estimation: for each block in the current frame, it searches nearby positions in previously decoded reference frames for the best match. If a person walks to the right, the codec stores "copy the block from 10 pixels to the left in the previous frame" (a motion vector) rather than the actual pixels. Only the residual (what the motion vector didn't predict) gets encoded.

Block sizes for motion estimation have grown with each codec generation. H.264 searches with blocks down to 4x4. H.265 uses flexible block sizes up to 64x64. AV1 supports 128x128 and adds warped motion prediction (accounting for rotation and zoom, not just translation). Larger and more flexible block sizes = better predictions = smaller residuals = smaller files. But also = more searching = slower encoding.

Frame Types: I, P, and B

Not all frames are compressed equally. Codecs use three frame types with different compression strategies:

I-frames (Intra-coded): Compressed using only spatial techniques — no reference to other frames. An I-frame is essentially a standalone compressed image (like a JPEG). It's the largest frame type but the only one that can be decoded independently. Every video starts with an I-frame, and they appear periodically as sync points for seeking.

P-frames (Predictive): Compressed using motion estimation from previous frames (forward prediction only). P-frames are significantly smaller than I-frames because they only store what changed. A P-frame references one or more previous I or P frames.

B-frames (Bidirectional): Compressed using motion estimation from both previous and future frames. B-frames achieve the best compression because they have more reference options. The tradeoff: B-frames must be stored out of display order (the encoder needs to encode the future reference frame before the B-frame), which adds complexity and latency.

Typical compression ratios by frame type (H.264, 1080p):

I-frame: 50-200KB per frame
P-frame: 10-50KB per frame
B-frame: 5-25KB per frame

GOP (Group of Pictures) is the sequence of frames between I-frames. A typical GOP is 30-250 frames. Shorter GOPs = more I-frames = larger file = better seeking accuracy. Longer GOPs = fewer I-frames = smaller file = coarser seeking. The default in most encoders is GOP = 250 frames (about 8 seconds at 30fps).

Codec Generations: The Evolution

Codec	Standard	Year	Key Innovation	Efficiency vs Previous
MPEG-1	ISO 11172	1993	First practical video codec (VCD quality)	Baseline
MPEG-2	ISO 13818	1995	Interlaced video, DVD/broadcast quality	~30% better than MPEG-1
MPEG-4 ASP	ISO 14496-2	2001	Object-based coding, quarter-pixel motion	~30% better than MPEG-2
H.264 (AVC)	ITU-T / ISO 14496-10	2003	CABAC, flexible block sizes (4x4-16x16), multi-reference prediction	~50% better than MPEG-2
H.265 (HEVC)	ITU-T / ISO 23008-2	2013	CTU up to 64x64, 35 angular prediction modes, SAO filter	~50% better than H.264
VP9	Google/WebM	2013	Superblocks up to 64x64, 10 intra modes, adaptive reference frames	~45% better than H.264
AV1	AOM	2018	Superblocks up to 128x128, 56+ intra modes, film grain synthesis, warped motion	~65% better than H.264

Each generation roughly doubles compression efficiency over the one before it (or halves the required bitrate at the same quality). The cost is always encoding complexity: H.264 encoding is roughly 5x faster than H.265, which is roughly 5x faster than AV1 (software encoders).

Hardware vs Software Encoding

Software encoders (x264, x265, libaom, SVT-AV1, libvpx) run on CPU. They have access to the full codec feature set and can perform exhaustive optimization. Software encoding is slow but produces the best quality per bit.

Hardware encoders (NVIDIA NVENC, Intel Quick Sync Video, AMD VCE/AMF, Apple VideoToolbox) use dedicated silicon on GPUs and SoCs. They're dramatically faster (real-time or faster) but use simplified algorithms that produce files 20-40% larger at the same quality.

Encoder	Type	Speed (1080p30)	Quality-per-bit
x264 (slow preset)	Software	~40 fps	Reference quality
NVENC H.264	Hardware	~300 fps	~75-85% of x264
x265 (medium preset)	Software	~15 fps	Reference quality
NVENC HEVC	Hardware	~200 fps	~75-85% of x265
SVT-AV1 (preset 6)	Software	~25 fps	~90% of libaom
NVENC AV1 (RTX 40+)	Hardware	~120 fps	~80-85% of SVT-AV1

When to use hardware encoding: Live streaming, real-time recording (OBS, screen capture), quick exports where time matters more than file size.

When to use software encoding: Final delivery, archival, web hosting (encode once, serve many times), any situation where smaller files save bandwidth or storage cost over time.

Profiles and Levels

Codecs define profiles (which compression features are enabled) and levels (maximum resolution, bitrate, and decode complexity). This lets a single codec standard span everything from low-power IoT cameras to 8K cinema.

H.264 profiles:

Baseline: No B-frames, no CABAC (uses CAVLC instead). For low-latency video conferencing and mobile devices. About 10-15% less efficient than Main.
Main: Adds B-frames and CABAC. Good balance of efficiency and decode complexity.
High: Adds 8x8 transforms, custom quantization matrices, monochrome support. Standard for Blu-ray and most modern encoding. About 10% more efficient than Main.
High 10: Adds 10-bit color depth. Required for HDR content.

H.264 levels: Level 3.0 = max 720p 30fps. Level 4.0 = max 1080p 30fps. Level 4.1 = max 1080p 60fps (common target). Level 5.1 = max 4K 30fps. Level 5.2 = max 4K 60fps.

For most conversions: H.264 High Profile, Level 4.1 covers 1080p at any frame rate and is universally supported. For 4K, use Level 5.1 or 5.2. Specify these in FFmpeg with -profile:v high -level 4.1.

Encoding Modes: CRF, CBR, VBR, and CQ

How the encoder allocates bits across frames dramatically affects both quality and file size.

CRF (Constant Rate Factor): The encoder targets a constant perceptual quality. Easy frames get fewer bits; complex frames get more. File size varies depending on content complexity. This is the recommended mode for offline encoding. CRF 23 is the x264 default. Lower = better quality = larger files.

CBR (Constant Bitrate): Every second of video gets the same number of bits regardless of complexity. Simple scenes waste bits; complex scenes don't get enough. Used for streaming where the delivery channel has a fixed bandwidth (e.g., satellite broadcast). Avoid for file-based encoding.

VBR (Variable Bitrate): The encoder varies the bitrate within a specified range (min/max) while targeting an average. More intelligent bit allocation than CBR, but the target average bitrate means the encoder must predict complexity across the entire video. Two-pass VBR encoding improves this by analyzing the video in the first pass and allocating bits in the second.

CQ (Constant Quality): Similar to CRF but specific to hardware encoders. NVENC and QSV use CQ mode to approximate CRF behavior. The quality scale differs from CRF — NVENC CQ 19 roughly equals x264 CRF 19, but the exact mapping varies by content.

Recommendation: Use CRF for all offline encoding. CRF 18 for visually lossless, CRF 23 for good quality, CRF 28 for acceptable quality at smaller size. Only use CBR/VBR when a target file size or bitrate is required.

Video codecs are fundamentally about trading computation for compression. Newer codecs search harder for redundancy, try more prediction modes, and use more sophisticated transforms — all of which takes more CPU time but produces smaller files at the same quality.

For practical conversion tasks, the essential knowledge is: CRF mode for quality control, the right profile/level for your target devices, and knowing when a conversion is a remux (fast, lossless) versus a transcode (slow, potential quality change). The codec choice itself — H.264, H.265, or AV1 — depends on your audience's devices, your encoding time budget, and whether file size savings justify the computational cost.

Key Takeaways

Codecs compress video by exploiting spatial redundancy (within frames) and temporal redundancy (between frames)
I-frames are self-contained; P-frames reference past frames; B-frames reference both past and future frames
Each codec generation roughly halves the required bitrate: MPEG-2 → H.264 → H.265 → AV1
Software encoders produce 20-40% smaller files than hardware encoders at the same quality, but are 5-20x slower
CRF mode is the best encoding mode for offline conversion — it targets constant quality, not constant bitrate
Profiles control which codec features are used; levels cap resolution and bitrate for decoder compatibility
Container format (MP4, MKV, WebM) doesn't affect video quality — only the codec and encoding settings do

Frequently Asked Questions

What does CRF mean and what value should I use?

CRF (Constant Rate Factor) tells the encoder to target a constant visual quality rather than a specific bitrate or file size. Lower CRF = higher quality = larger file. For x264 (H.264): CRF 18 is visually lossless, 23 is good quality (default), 28 is acceptable for sharing. For x265 (H.265): add 4-6 to the CRF value (CRF 24 ≈ x264 CRF 18). For AV1: CRF values in the 24-32 range are typical.

Can I change a video's codec without losing quality?

No. Changing codecs (transcoding) always re-encodes the video, which introduces a generation of compression artifacts. The loss may be invisible at high quality settings, but it's technically present. The only quality-preserving conversion is remuxing — changing the container (e.g., MKV to MP4) while keeping the same codec. If quality preservation is critical, keep the original file alongside any transcoded versions.

Why do newer codecs take longer to encode?

Newer codecs achieve better compression by searching more possibilities: more block sizes (AV1: up to 128x128 vs H.264: up to 16x16), more prediction modes (AV1: 56+ vs H.264: 9), more motion search ranges, and more complex transform options. Each additional possibility the encoder evaluates takes CPU time. The result is better compression at the cost of slower encoding. Hardware encoders shortcut this by limiting the search space.

What's the difference between 8-bit and 10-bit video?

8-bit video stores 256 brightness levels per color channel (16.7 million total colors). 10-bit stores 1,024 levels per channel (over 1 billion total colors). The practical difference is visible in smooth gradients — 8-bit video can show banding (visible steps between colors) in blue skies and dark scenes, while 10-bit eliminates this. HDR content requires 10-bit. 10-bit encoding is about 20% slower and produces files about 15% larger at the same CRF.

Does converting MP4 to MKV change the codec?

No. Converting between containers (remuxing) doesn't touch the codec. The compressed video and audio streams are copied byte-for-byte from one container to another. An MP4 file with H.264+AAC becomes an MKV file with H.264+AAC — identical compressed data in a different wrapper. This is why remuxing is instant regardless of file size.

What's the difference between preset and CRF in FFmpeg?

CRF controls the quality target (how good the output looks). Preset controls the encoding speed/efficiency tradeoff (how long it takes to achieve that quality). A slower preset produces a smaller file at the same CRF because it tries harder to find optimal compression. x264's presets range from ultrafast to veryslow. 'Medium' (default) is a reasonable balance. 'Slow' produces ~5-10% smaller files. 'Veryslow' produces ~2% more savings but takes much longer.

Why do some video files stutter during seeking?

Seeking in a video requires finding the nearest I-frame and decoding forward to the target position (since P and B frames depend on previous frames). If the GOP (distance between I-frames) is very long (250+ frames), seeking may require decoding several seconds of frames before reaching the target. Shorter GOPs improve seeking responsiveness at the cost of slightly larger file size.

Is lossless video encoding practical?

For storage and delivery, no — lossless H.264 (CRF 0) produces files only about 3x smaller than raw uncompressed, which is still enormous. A 1-minute 1080p lossless H.264 file is about 3-4GB. For intermediate editing (preserving quality between processing steps), codecs like ProRes, DNxHR, and Cineform offer near-lossless compression at 4-10x reduction while maintaining editing performance. True lossless is for archival of irreplaceable source material where no quality compromise is acceptable.

Ready to convert your files?

Use ChangeThisFile to convert between 600+ formats — free, fast, and private.

Start Converting