Sample rate and bit depth are the two numbers that define digital audio resolution, and they're wildly misunderstood. The hi-res audio industry has a financial incentive to convince you that bigger numbers mean better sound. The physics doesn't agree.
Here's the uncomfortable truth: for finished, mastered music, 44.1 kHz / 16-bit (CD quality) captures everything human ears can hear with headroom to spare. The Nyquist-Shannon theorem proves this mathematically, and decades of double-blind listening tests confirm it empirically. "Hi-res" sample rates and bit depths have legitimate uses in production — but not the ones most people think.
This guide covers exactly what these numbers mean, when they matter, and when they're marketing. If you work with audio in any capacity — music production, podcasting, game development, video editing — understanding these fundamentals prevents wasted storage, unnecessary processing, and buying snake oil.
Sample Rate: How Fast We Measure the Wave
Analog sound is a continuous pressure wave. Digital audio captures it by measuring (sampling) the wave's amplitude at regular intervals. The sample rate is how many measurements per second:
- 44,100 Hz (44.1 kHz): 44,100 measurements per second — the CD standard
- 48,000 Hz (48 kHz): 48,000/sec — the video/broadcast standard
- 96,000 Hz (96 kHz): 96,000/sec — "hi-res" audio
- 192,000 Hz (192 kHz): 192,000/sec — maximum common hi-res
Each sample is a snapshot of the wave's position at one instant. Played back in sequence, these snapshots reconstruct the original wave. The question is: how many snapshots do you need?
The Nyquist-Shannon Theorem: Why 44.1 kHz Is Enough
The Nyquist-Shannon sampling theorem (1949) states: a bandlimited signal can be perfectly reconstructed from its samples if the sample rate is at least twice the signal's highest frequency component.
Human hearing range: ~20 Hz to ~20,000 Hz (20 kHz). Most adults over 25 hear nothing above 16-17 kHz. The 20 kHz figure is the theoretical maximum for healthy young ears.
Nyquist frequency at 44.1 kHz: 44,100 / 2 = 22,050 Hz. This exceeds the 20 kHz hearing limit by 2,050 Hz — a comfortable guard band for the anti-aliasing filter to operate.
The key insight: this isn't an approximation. The theorem proves that the original continuous waveform is perfectly reconstructed from the samples — not approximately, not "close enough," but mathematically identical for all frequencies up to the Nyquist limit. The reconstructed output is not a staircase of sample points; the DAC (digital-to-analog converter) produces a smooth, continuous wave using interpolation.
This means 44.1 kHz captures literally everything you can hear. Not most of what you can hear. All of it.
Why 44.1 kHz Specifically (And Why 48 kHz for Video)
44.1 kHz: Chosen for the CD standard (Red Book, 1980). The number comes from early digital recording history — the first digital recorders stored audio on video tape using a modified NTSC or PAL signal. The sample rate was constrained by the video format's line rate: 3 samples per line × 245 active lines × 60 fields/second = 44,100 Hz for NTSC. It stuck.
48 kHz: Chosen for professional video and broadcast (DAT tape, DVD, Blu-ray, broadcast TV). It divides evenly into common video frame rates: 48,000 / 24 fps = 2,000 samples per frame. 48,000 / 30 fps = 1,600 samples per frame. Clean integer ratios simplify sync. Film and TV audio is virtually always 48 kHz.
Practical impact: If you're producing music only, 44.1 kHz is fine — it avoids an unnecessary sample rate conversion when mastering for CD or streaming. If you're working with video in any capacity, use 48 kHz — it's the standard and avoids sample rate conversion at the video editing stage.
Converting between 44.1 and 48 kHz: Not lossless. Sample rate conversion (SRC) is an interpolation process that introduces tiny artifacts. High-quality SRC (like SoX or iZotope) makes these artifacts negligible, but they exist. Choose your sample rate at the start of a project and stick with it.
96 kHz and 192 kHz: Where the Marketing Begins
96 kHz captures frequencies up to 48 kHz. 192 kHz captures up to 96 kHz. Humans can't hear above ~20 kHz. So what's the point?
The arguments for high sample rates:
- "Better anti-aliasing filter behavior": With more space between the audible range (20 kHz) and the Nyquist frequency (48 or 96 kHz), the anti-aliasing filter can be gentler, causing less phase distortion near the audible boundary. This is technically true but the effect is inaudible with modern filter designs used at 44.1/48 kHz.
- "Headroom for processing": When applying heavy DSP (time-stretching, pitch-shifting, nonlinear effects), higher sample rates can reduce aliasing artifacts that fold back into the audible range. This is the one legitimate use case for 96 kHz in production.
- "Some people can feel ultrasonic frequencies": The "hypersonic effect" hypothesis claims that ultrasonic frequencies affect brain activity. The research is contested and no consensus exists. Even if real, the effect doesn't apply to transducer playback — most tweeters roll off sharply above 20 kHz.
The evidence against: In controlled double-blind ABX tests (Meyer & Moran 2007, Reiss 2016, and numerous others), trained listeners cannot reliably distinguish 44.1 kHz from 96 kHz masters, even on reference-quality playback systems. The sample rate difference is measurable with instruments but inaudible to humans.
Verdict: 96 kHz is defensible for production (processing headroom). 192 kHz has no practical benefit even in production. For final delivery and playback, 44.1 or 48 kHz is the correct choice.
Bit Depth: Dynamic Range, Not "Resolution"
Bit depth determines the number of possible amplitude values for each sample. More bits = more values = a wider gap between the quietest possible sound and the loudest.
| Bit Depth | Amplitude Values | Theoretical Dynamic Range | Real-World Equivalent |
|---|---|---|---|
| 8-bit | 256 | 48 dB | Noisy vintage game audio |
| 16-bit | 65,536 | 96 dB | Quiet room to loud concert |
| 24-bit | 16,777,216 | 144 dB | Beyond the threshold of pain |
| 32-bit float | ~4 billion (but floating point) | ~1,528 dB (theoretical) | Mathematically limitless headroom |
Dynamic range formula: 6.02 × number of bits + 1.76 dB. For 16-bit: 6.02 × 16 + 1.76 = 98.1 dB (rounded to 96 dB in practice due to implementation).
16-bit = 96 dB dynamic range: The quietest sound is 96 dB below the loudest. For context: a quiet bedroom is ~30 dB, normal conversation is ~60 dB, a rock concert is ~110 dB, pain threshold is ~130 dB. 96 dB covers the entire useful dynamic range of music with room to spare.
24-bit = 144 dB dynamic range: Extends the quiet end by 48 dB below 16-bit's noise floor. This means incredibly quiet sounds can be captured without quantization noise. In practice, no microphone, preamp, or room is quiet enough to benefit from 24-bit's full range — even the best recording chains have a noise floor around -120 dB.
Why 24-bit Matters in Production (But Not Playback)
24-bit is the standard recording and production format, not because the final listener needs 144 dB of dynamic range, but because it provides headroom during the recording and mixing process:
- Recording safety margin: With 24-bit, you can record at conservative levels (-18 to -12 dBFS) without worrying about the noise floor. The signal still has 120+ dB of clean range. With 16-bit at the same levels, you'd lose 18-30 dB of your 96 dB range to unused headroom.
- Processing headroom: EQ boosts, compression, reverb, and other effects can temporarily push levels beyond 0 dBFS internally. 32-bit float processing handles this without clipping, and 24-bit provides enough precision for the results.
- Gain staging flexibility: Mixing 40 tracks means adjusting levels extensively. 24-bit's extra precision means small level adjustments don't lose significant resolution.
For the final delivered file (the master that listeners hear), 16-bit is sufficient. The mastering process compresses the dynamic range, applies limiting, and fits the audio neatly into 16-bit's 96 dB range. No commercially released music uses more than ~70 dB of dynamic range in practice (classical occasionally approaches 60 dB; pop/rock often uses less than 20 dB after modern loudness processing).
Dithering: The Crucial Step When Converting 24-bit to 16-bit
When you reduce bit depth (24-bit to 16-bit for CD/streaming release), you're truncating the bottom 8 bits of each sample. Without treatment, this creates correlated quantization distortion — audible as a faint, harsh graininess on quiet passages.
Dithering solves this by adding a tiny amount of random noise before truncating. This converts the correlated distortion into uncorrelated noise — a faint, benign hiss that's far less objectionable than the distortion it replaces. The noise level is extremely low (around -93 dBFS) and inaudible in any normal listening environment.
Noise shaping goes further: it reshapes the dither noise spectrum, pushing it into frequencies where human hearing is least sensitive (above 15 kHz). This effectively makes the noise inaudible even at high playback volumes. Common noise shaping curves: POW-R (types 1-3), MBIT+, and flat triangular dither (TPDF).
Critical rule: Dither once, at the final bit-depth reduction step. Do not dither multiple times in a signal chain — each dithering pass adds noise. Your mastering chain should: process at 24-bit (or 32-bit float) → apply dither → truncate to 16-bit → done. Most DAWs apply dither automatically during export when the target bit depth is lower than the session's.
What to Use: A Decision Table
| Stage | Sample Rate | Bit Depth | Why |
|---|---|---|---|
| Recording (music only) | 44.1 or 48 kHz | 24-bit | 24-bit for headroom; 48 kHz if any chance of video use |
| Recording (with video) | 48 kHz | 24-bit | 48 kHz matches video standard, 24-bit for headroom |
| Mixing/editing | Match recording | 24 or 32-bit float | Avoid sample rate conversion; float for processing headroom |
| Mastering | Match recording | 24 or 32-bit float internal → dither to target | Dither at the final step only |
| CD release | 44.1 kHz | 16-bit (dithered) | Red Book CD standard |
| Streaming (lossy) | 44.1 or 48 kHz | 16-bit input to encoder | Lossy codecs have their own noise floor; 24-bit is wasted |
| Hi-res store | Up to 96 kHz | 24-bit | Matching the master resolution |
| Podcast | 48 kHz | 16-bit (or 24-bit recording → 16-bit export) | 48 kHz is broadcast standard; 16-bit is more than sufficient for speech |
Record at 48 kHz / 24-bit. Mix and process at the same rate. Deliver at 44.1 or 48 kHz / 16-bit (with dither) for the final release. That's the workflow that maximizes quality while avoiding wasted storage and processing. "Hi-res" playback files at 96 or 192 kHz won't sound better than 44.1 kHz through your speakers or headphones — the physics of human hearing guarantees it.
Need to change formats? Convert WAV to MP3, WAV to FLAC, or FLAC to WAV — free at ChangeThisFile.