You converted a 2GB video file and the output looks fine — but is it actually complete? You downloaded a software installer and it runs — but is it the genuine, unmodified file? You transferred 10,000 photos to a new drive and the count matches — but are all the files intact?

Checksums answer these questions definitively. A checksum is a fixed-length value computed from the contents of a file using a hash function. Change one bit in a 10GB file and the checksum changes completely. Checksums don't tell you what changed, but they tell you conclusively whether anything changed. This guide covers the major checksum algorithms, when to use each, and how to verify file integrity in practice.

How Checksums Work

A checksum algorithm reads every byte of a file and produces a fixed-size output (the "digest" or "hash"). The same input always produces the same output. Different inputs produce different outputs (with extremely high probability). The output is typically represented as a hexadecimal string.

Example: the SHA-256 hash of the string "hello" is 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824. Change "hello" to "Hello" and the hash becomes 185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969 — completely different. This avalanche effect is by design: even a 1-bit change in the input flips roughly half the bits in the output.

File checksums are computed the same way, just on larger inputs. A SHA-256 hash of a 10GB video file is still a 64-character hex string. Computing it takes a few seconds (SHA-256 processes data at 200-500 MB/s on modern CPUs).

Checksum Algorithms Compared

AlgorithmOutput SizeSpeedCollision ResistanceUse Case
CRC3232 bits (8 hex chars)Very fastLow (collisions expected for large datasets)Error detection in archives (ZIP, PNG, Ethernet)
MD5128 bits (32 hex chars)FastBroken (collisions can be crafted in seconds)Legacy checksums, non-security file verification
SHA-1160 bits (40 hex chars)FastBroken (Google/CWI produced collision in 2017)Git (still), legacy systems
SHA-256256 bits (64 hex chars)ModerateStrong (no known practical attacks)Standard for integrity verification, software distribution
SHA-512512 bits (128 hex chars)Faster than SHA-256 on 64-bit CPUsStrongHigh-security applications, slightly faster on modern hardware
BLAKE3256 bits (default)Very fast (parallelizable)StrongModern alternative, 5-10x faster than SHA-256

Which Algorithm to Use

For file integrity verification (conversions, transfers, backups): SHA-256. It's the standard, universally supported, and secure. No practical attacks exist or are foreseeable.

For speed-critical bulk operations: BLAKE3 if your tools support it (5-10x faster than SHA-256). Otherwise, MD5 is adequate for non-security integrity checks — its collision weakness only matters when an adversary is trying to create a fake file with the same hash, not for detecting accidental corruption.

For archive integrity: CRC32 is already built into ZIP, RAR, 7Z, and PNG. You don't choose it — it's automatic. It catches transmission errors and bit flips but isn't suitable for detecting deliberate tampering.

Never use for new projects: MD5 and SHA-1 for security purposes. Both have demonstrated collision attacks. MD5 collisions can be generated in seconds on a laptop.

Verifying File Integrity After Conversion

Checksums are most useful for lossless operations where the output should be deterministic:

  • Archive conversion: ZIP to 7Z should extract to identical files. Compute checksums on the contents of both archives — every file should match.
  • Container remux: MKV to MP4 with stream copy should produce bitwise-identical audio/video streams. Extract streams with FFmpeg and compare checksums.
  • Lossless format change: WAV to FLAC and back to WAV should produce a file identical to the original. Compute SHA-256 on both WAVs — they must match.
  • Text format conversion: JSON to YAML and back to JSON should round-trip. Compare normalized outputs (ignoring whitespace differences).

For lossy conversions (PNG to JPG, WAV to MP3), checksums can't verify "quality" — the output is intentionally different from the input. Instead, verify that the output file isn't corrupted by checking that it opens correctly and has a reasonable file size.

Verifying Downloads

Software distributors publish checksums alongside download links. The process: download the file, compute the checksum locally, compare with the published value. If they match, the file is genuine and uncorrupted.

On macOS/Linux:

shasum -a 256 downloaded_file.iso

On Windows (PowerShell):

Get-FileHash downloaded_file.iso -Algorithm SHA256

On Windows (cmd):

certutil -hashfile downloaded_file.iso SHA256

Compare the output string with the published hash. They must match exactly — a single character difference means the file is different.

Checksum Files: .md5, .sha256, .sfv

Checksum files store hash values alongside filenames for bulk verification:

.md5 file format:

d41d8cd98f00b204e9800998ecf8427e  file1.mp4
7d793037a0760186574b0282f2f435e7  file2.mp4

.sha256 file format: Same layout, longer hashes.

.sfv (Simple File Verification): Uses CRC32, commonly used for Usenet downloads and game ROM sets.

file1.mp4 3A4B5C6D
file2.mp4 1E2F3A4B

Generate checksum files:

# Generate SHA-256 checksums for all files in a directory
sha256sum * > checksums.sha256

# Verify all files against the checksum file
sha256sum -c checksums.sha256

The -c (check) flag reads the checksum file and verifies each listed file, printing OK or FAILED for each. This is the standard way to verify a batch transfer or backup.

Practical Checksum Use Cases

  1. Backup verification: After copying files to a new drive, generate checksums on the source, copy the checksum file, and verify on the destination. This catches bit-rot, copy errors, and filesystem issues that a simple file count misses.
  2. Batch conversion validation: After converting 1,000 FLAC files to MP3 and later converting the MP3 back to WAV to spot-check, use checksums to verify the round-trip. (Note: lossy round-trips won't match the original; this only works for lossless conversions.)
  3. Deduplication: Two files with the same SHA-256 hash are identical with virtual certainty. Hash all files in a directory tree and group by hash to find duplicates — regardless of filename.
  4. Transfer verification: When sending large files over unreliable connections, compute the checksum before sending and after receiving. If they match, the transfer was error-free.
  5. Archive integrity monitoring: Generate checksums for your file archive once. Periodically re-verify to detect bit-rot (silent data corruption on aging storage media). Any mismatch means a file has been corrupted since the last check.

Checksums are the simplest and most reliable tool for answering "is this file the same as that file?" A 64-character string tells you more about a file's integrity than any amount of visual inspection. Use them after batch conversions, after large transfers, and as part of any backup strategy. The 5 seconds it takes to compute a SHA-256 hash can save hours of debugging a corrupted file.