MP4 is the file format the internet runs on. YouTube serves it. iPhones record it. Every browser plays it. Every social platform accepts it. When someone says "send me the video," what they mean is "send me the MP4."
But MP4 is not a single thing. It's a container format based on Apple's QuickTime specification, standardized as ISO/IEC 14496-14. The container itself doesn't compress anything — it organizes compressed video, audio, subtitles, and metadata into a structured file. The codec inside (H.264, H.265, AV1) does the actual compression work.
Understanding how MP4 works — and what codec combinations it supports — is the difference between a 100MB file that stutters on playback and a 20MB file that streams smoothly everywhere.
What MP4 Actually Is
MP4 is a container format, not a codec. Think of it as a box that holds multiple streams of data: one or more video tracks, one or more audio tracks, optional subtitle tracks, chapter markers, and metadata. The container defines how these streams are interleaved and synchronized, but the actual compression is handled by the codecs inside.
MP4 is formally known as MPEG-4 Part 14, and it's directly derived from Apple's QuickTime File Format (.mov). The two formats are so similar that many MP4 files can be renamed to .mov and play fine in QuickTime, and vice versa. The key difference is that MP4 restricts which codecs are officially supported, while MOV is more permissive.
Atom-Based File Structure
MP4 files are built from nested boxes called atoms (or "boxes" in the ISO spec). Each atom has a type, a size, and content that can include other atoms. The critical atoms are:
- ftyp — File type declaration. Appears first and identifies the file as MP4 (brand
isomormp42). This is how media players quickly identify the format without parsing the entire file. - moov — Movie metadata. Contains the track list, codec parameters, timing information, and an index of where each frame lives in the file. Without
moov, the file is unplayable. - mdat — Media data. The actual compressed video and audio frames. This is typically 99%+ of the file size.
- moof — Movie fragment. Used in fragmented MP4 for streaming. Each fragment contains its own metadata and data, allowing playback to start before the entire file is downloaded.
The moov atom position matters. If moov comes after mdat (which is the default for many encoders), the entire file must be downloaded before playback can start. Moving moov before mdat (called "fast start" or "web optimized") enables progressive playback. FFmpeg's -movflags +faststart flag does this.
Video Codecs Inside MP4
The container is just the wrapper. The codec determines quality, file size, and compatibility. Here are the three codecs you'll actually encounter in MP4 files:
H.264 (AVC) — The Universal Codec
H.264 has been the dominant video codec since roughly 2005. Every device manufactured in the last decade has hardware H.264 decoding. Every browser supports it. Every editing application handles it natively.
Profiles and Levels: H.264 defines profiles (Baseline, Main, High) that control which compression features are available, and levels (3.0, 4.0, 4.1, 5.1, etc.) that set maximum resolution and bitrate. Baseline profile is for low-power devices (no B-frames, no CABAC). Main profile adds B-frames and CABAC entropy coding. High profile adds 8x8 transforms and additional quantization. For general use, High profile, Level 4.1 covers 1080p at up to 50Mbps.
Typical bitrates:
- 720p (1280x720): 3-5 Mbps for good quality
- 1080p (1920x1080): 5-10 Mbps for good quality
- 4K (3840x2160): 35-68 Mbps for good quality
H.264's compression efficiency is roughly half of H.265 — meaning you need about twice the bitrate for the same visual quality. But its universal hardware support makes encoding and decoding fast and energy-efficient everywhere.
H.265 (HEVC) — Better Compression, Complicated Licensing
H.265 achieves roughly 50% better compression than H.264 at the same quality. A 10Mbps H.264 file looks about the same as a 5Mbps H.265 file. This matters enormously for 4K content, where the bitrate savings translate to gigabytes per hour.
The licensing problem: H.265 has a fragmented patent licensing landscape with multiple patent pools (MPEG LA, HEVC Advance, Velos Media) making it expensive for companies to deploy. This is why browser support is patchy — Chrome supports H.265 on platforms with hardware decoders, but software-only decode in Chrome wasn't added until 2023. Safari has supported it since 2017 via macOS/iOS hardware. Firefox's support depends on the platform.
Typical bitrates at equivalent H.264 quality:
- 1080p: 3-5 Mbps
- 4K: 15-30 Mbps
- 4K HDR: 20-40 Mbps
Use H.265 when you control the playback environment (your own device, a known player) or when file size reduction matters more than universal compatibility.
AV1 — The Royalty-Free Future
AV1 is the codec that was designed to kill H.265's licensing mess. Developed by the Alliance for Open Media (Google, Apple, Amazon, Netflix, Meta, Microsoft), it's royalty-free and achieves roughly 30% better compression than H.265.
The tradeoff is encoding speed. AV1 encoding is 10-50x slower than H.264 encoding at comparable settings. A 1-minute 1080p clip might take 30 seconds to encode as H.264 but 10-20 minutes as AV1. Hardware AV1 encoders (Intel Arc, NVIDIA RTX 40-series, AMD RDNA 3) have dramatically narrowed this gap, but software encoding remains slow.
Hardware AV1 decoding is available on chips from 2020 onward: Samsung Exynos 2100+, Apple A17 Pro+, Qualcomm Snapdragon 888+, Intel 11th-gen+, NVIDIA RTX 30-series+, AMD RDNA 2+. Older devices can't play AV1 without software decoding, which drains battery on mobile.
Bottom line: AV1 in MP4 is the right choice for content you encode once and serve many times (YouTube, Netflix). For one-off exports or real-time encoding, H.264 is still faster and more compatible. Convert MP4 to AV1.
Audio Codecs in MP4
MP4 supports several audio codecs, but AAC dominates:
- AAC (Advanced Audio Coding) — The standard audio codec for MP4. AAC-LC (Low Complexity) at 128-192 kbps is transparent for most listeners. HE-AAC (High Efficiency) uses Spectral Band Replication to sound acceptable at 48-64 kbps, useful for voice and low-bandwidth streaming.
- MP3 — Technically supported in MP4 containers but rarely used. MP3 in MP4 exists mainly for backward compatibility.
- AC-3 (Dolby Digital) — 5.1 surround sound. Common in ripped DVDs and Blu-rays packaged as MP4. Not universally supported by all players.
- ALAC (Apple Lossless) — Supported in MP4 containers (technically .m4a for audio-only). Lossless but Apple-ecosystem-centric.
- Opus — Newer, excellent quality per bitrate, but MP4+Opus support in players is inconsistent. WebM is the preferred container for Opus.
For maximum compatibility: AAC-LC at 128-192 kbps stereo. You won't encounter complaints. Need to extract audio? MP4 to MP3 or MP4 to AAC.
Fragmented MP4 and Streaming
Standard MP4 files have a single moov atom that indexes the entire file. This works for downloads but not for live streaming — you can't write a moov that references data that hasn't been created yet.
Fragmented MP4 (fMP4) solves this by splitting the file into self-contained fragments, each with its own moof (movie fragment) and mdat (media data) pair. Each fragment can be decoded independently, and new fragments can be appended as they're created.
DASH and HLS: How Streaming Uses fMP4
DASH (Dynamic Adaptive Streaming over HTTP) uses fMP4 as its native segment format. The client downloads a manifest file (MPD) that lists available quality levels, then requests fMP4 segments at the appropriate bitrate based on network conditions.
HLS (HTTP Live Streaming) originally used MPEG-TS (.ts) segments, but Apple added fMP4 support in HLS in 2016. Modern HLS deployments almost universally use fMP4 segments because they're smaller (no TS overhead) and compatible with DASH, allowing a single set of encoded files to serve both protocols.
CMAF (Common Media Application Format) is the formal standard for fMP4 segments that work with both DASH and HLS. Netflix, YouTube, Disney+, and most major streaming services use CMAF-compatible fMP4.
The practical implication: when you create an MP4 for streaming, you need -movflags +frag_keyframe+empty_moov in FFmpeg, not just +faststart. Faststart is for progressive download; fMP4 is for adaptive streaming.
Subtitles and Metadata in MP4
MP4 supports embedded subtitle tracks using the tx3g format (also called 3GPP Timed Text or MPEG-4 Timed Text). These are soft subtitles — the player can toggle them on/off and the text remains separate from the video pixels.
However, MP4's subtitle support is limited compared to MKV. You can't embed ASS/SSA styled subtitles (MKV can). You can't have multiple subtitle formats in one file (MKV can). For complex subtitle needs, MKV is the better container.
Metadata atoms in MP4 store information like title, artist, album, year, genre, cover art, and description. The udta atom holds user data, and meta holds iTunes-style metadata. Video editing software, media players, and streaming services all read these fields differently. FFmpeg's -metadata title="My Video" writes to the standard location. Extract subtitles with MP4 to SRT.
When to Use MP4 (and When Not To)
Use MP4 when:
- Sharing with anyone — MP4+H.264+AAC plays everywhere. No questions asked.
- Uploading to social platforms — YouTube, Instagram, TikTok, Twitter, Facebook, LinkedIn all prefer MP4.
- Web embedding — The
<video>tag with an MP4 source works in every browser. - Streaming — fMP4 is the foundation of DASH, HLS, and CMAF.
- Mobile playback — Hardware H.264 decoders in every phone made since 2010.
Consider alternatives when:
- You need multiple audio/subtitle tracks — Use MKV. MP4 technically supports multiple tracks, but player support is inconsistent. Convert MP4 to MKV.
- You need maximum compression — Use WebM+VP9 or WebM+AV1 for web delivery where you control the player. Convert MP4 to WebM.
- You need an open format with zero patent concerns — WebM is fully royalty-free. MP4 with H.264 has patent licensing (handled by OS/browser vendors, but it's there).
- You're archiving with chapter markers and metadata — MKV's chapter and tag system is more flexible.
Converting Files to and from MP4
Most video-to-MP4 conversions fall into two categories:
Remuxing (fast, lossless): If the source file contains H.264 or H.265 video with AAC audio (just in a different container), the conversion simply repackages the streams. MKV to MP4 and MOV to MP4 are almost always remux operations — they complete in seconds regardless of file size and produce identical quality.
Re-encoding (slow, potential quality change): If the source codec isn't compatible with MP4 (e.g., VP9 from WebM, or DivX from old AVI files), the video must be decoded and re-encoded as H.264/H.265. This takes minutes to hours depending on file size and settings. AVI to MP4, WMV to MP4, and FLV to MP4 typically require re-encoding.
From MP4 to other formats:
- MP4 to GIF — Extracts frames and encodes as animated GIF. Expect 10-50x larger file sizes.
- MP4 to MP3 — Extracts and converts the audio track only.
- MP4 to WebM — Re-encodes to VP9/Opus for web delivery.
- MP4 to MKV — Remux, usually instant and lossless.
- MP4 to AVI — Legacy compatibility only. No reason to do this in 2026.
Recommended MP4 Encoding Settings
| Use Case | Video Codec | Audio | Resolution | Bitrate / CRF | Notes |
|---|---|---|---|---|---|
| General sharing | H.264 High | AAC 128kbps | 1080p | CRF 23 | Add -movflags +faststart |
| High quality archive | H.264 High | AAC 192kbps | Original | CRF 18 | Visually lossless, large files |
| Small file for email | H.264 Main | AAC 96kbps | 720p | CRF 28 | Targeting under 25MB |
| 4K delivery | H.265 Main | AAC 192kbps | 2160p | CRF 24 | 50% smaller than H.264 equivalent |
| Web streaming | AV1 | Opus 128kbps | Adaptive | CRF 30 | Encode once, stream to many |
CRF (Constant Rate Factor) is the quality control knob for modern encoders. Lower CRF = higher quality = larger file. CRF 0 is lossless. CRF 51 is garbage. The sweet spot for H.264 is 18-28, with 23 being FFmpeg's default. For H.265, use CRF values about 4-6 higher than your H.264 target (H.265 CRF 28 ≈ H.264 CRF 23 quality).
MP4 earned its dominance by being good enough at everything. It's not the most flexible container (MKV is), not the most web-native (WebM is), and not the most efficient (AV1-in-WebM can be smaller). But it plays everywhere, streams well, and supports the codecs that matter.
For 90% of video tasks — sharing, uploading, embedding, archiving — MP4 with H.264 and AAC is the answer. When you need better compression, switch to H.265 or AV1 inside the same MP4 container. When you need more tracks and metadata flexibility, convert to MKV. The format is mature, universal, and not going anywhere.