ZIP is the format everyone knows and nobody thinks about. Windows, macOS, Linux, iOS, Android — every operating system on Earth can open a ZIP file without installing anything. That ubiquity is the entire point. Phil Katz created the format in 1989, and 37 years later it remains the default way to bundle and compress files for sharing.
But ZIP is more complex than most people realize. Behind the "right-click, extract" simplicity lies a format with two encryption methods (one broken, one strong), two address modes (one with a 4GB limit, one without), per-file compression that trades efficiency for random access, and modern extensions that most tools don't support yet. This guide covers everything.
If you need to convert ZIP to another format, ChangeThisFile supports ZIP to 7Z, ZIP to TAR, ZIP to TAR.GZ, ZIP to TAR.BZ2, and ZIP to TAR.XZ — all processed server-side with 7-Zip.
Phil Katz, PKZIP, and the Birth of ZIP
Phil Katz created the ZIP format in 1989 as the file format for his PKZIP utility. The format was a response to System Enhancement Associates' ARC format — Katz had previously created a faster ARC-compatible tool (PKARC), got sued, and responded by designing an entirely new format that was technically superior and, critically, placed in the public domain.
That public domain decision is what made ZIP ubiquitous. Anyone could implement ZIP support without licensing fees. By the mid-1990s, WinZip had made ZIP the standard archive format on Windows, and the format's open specification meant competing tools could appear freely. When Microsoft built ZIP support directly into Windows XP in 2001, the format's dominance became permanent.
The .zip file extension comes from Katz's desire for the format to be "faster than the speed of light" — a zip. The APPNOTE.TXT specification, maintained now by PKWARE Inc., has been updated 25+ times since 1989, adding features like ZIP64, AES encryption, and Unicode filename support.
The Deflate Algorithm: How ZIP Compresses Data
ZIP's default compression method is Deflate, a combination of two classical algorithms: LZ77 (dictionary-based deduplication) and Huffman coding (entropy encoding). Deflate was also designed by Phil Katz, and its dual approach is what made it effective enough to become the most widely deployed compression algorithm in history — it also powers GZIP, PNG, and HTTP content compression.
LZ77: Finding Repeated Patterns
The first stage of Deflate uses a sliding window (up to 32KB) to find repeated byte sequences. When it encounters bytes that match something earlier in the window, it replaces them with a back-reference: "go back 1,240 bytes and copy 38 bytes." This is why text files compress well — natural language has enormous repetition. A 32KB window means Deflate can spot duplicates within about 32,000 characters of each other.
That 32KB window is also Deflate's fundamental limitation. Modern algorithms like LZMA (used by 7Z) use dictionaries up to 1.5GB — 47,000x larger. They can find patterns separated by millions of bytes. This is why 7Z achieves 30-70% better compression on large text-heavy files: its dictionary is simply big enough to capture long-range redundancy that Deflate's tiny window misses entirely.
Huffman Coding: Optimal Bit Assignment
After LZ77 finds repeating patterns, Huffman coding assigns shorter bit sequences to more frequent symbols and longer ones to rare symbols. If the letter 'e' appears 1,000 times and the letter 'q' appears 3 times, 'e' gets a very short code (maybe 3 bits) and 'q' gets a longer one (maybe 12 bits). The result is that the most common data uses the fewest bits — mathematically optimal for the given frequency distribution.
Deflate creates a fresh Huffman tree for each "block" of compressed data (blocks range from a few hundred bytes to 64KB). This means the encoding adapts to the local character distribution, which is good for files that change character (like a file with English text in one section and base64-encoded data in another).
Per-File Compression: ZIP's Key Design Choice
ZIP compresses each file independently. File A is compressed. File B is compressed. File C is compressed. Then all three compressed streams are concatenated into the archive with a central directory at the end that lists every file's offset and size.
This per-file approach has a major advantage: random access. You can extract any single file from a 50GB ZIP archive without decompressing anything else. The tool reads the central directory, jumps to the file's offset, decompresses that one stream, and you're done. 7Z with solid compression can't do this — extracting file #500 requires decompressing files #1 through #499.
The tradeoff is compression efficiency. If your archive contains 1,000 similar configuration files, ZIP compresses each independently and misses the massive cross-file redundancy. 7Z's solid compression would treat all 1,000 as one continuous stream and exploit the repetition between them, achieving dramatically smaller output. On a set of 1,000 similar XML files, 7Z solid mode can produce an archive 60-80% smaller than ZIP.
For most real-world use — sending a dozen files to a colleague, distributing a software package — per-file compression is the right tradeoff. The compression ratio difference is moderate on heterogeneous content, and the random access capability is genuinely useful.
ZIP32 vs ZIP64: Breaking the 4GB Barrier
The original ZIP specification (ZIP32) stores file sizes and offsets as 32-bit unsigned integers. This creates two hard limits: individual files cannot exceed 4,294,967,295 bytes (4 GiB), and the archive cannot contain more than 65,535 entries. In 1989, these were absurdly large numbers. By the 2000s, a single video file could blow past 4GB.
ZIP64, added to the specification in 2001 (APPNOTE version 4.5), extends size fields to 64 bits. The theoretical maximum file size becomes 16 exbibytes (16 million terabytes), and there's no practical limit on the number of files. Most ZIP tools created since ~2005 switch to ZIP64 automatically when they detect a file exceeding 4GB or an archive exceeding 65,535 entries.
The compatibility concern is mostly historical. Software from before 2005 may reject ZIP64 archives with generic "corrupt archive" errors. Java's java.util.zip didn't support ZIP64 until Java 7 (2011). If you're targeting extremely old systems or embedded devices, stick to ZIP32 limits. For everything else, ZIP64 just works.
How to Tell If a ZIP Uses ZIP64
Open the ZIP with 7z l -slt archive.zip and look for "Is64" or check the header version. Any archive with files over 4GB or more than 65,535 entries is ZIP64. The format is backward-compatible: if all files are under 4GB and there are fewer than 65,535 entries, the archive uses ZIP32 headers regardless of the tool that created it.
ZIP Encryption: ZipCrypto vs AES-256
ZIP supports two encryption methods, and choosing the wrong one is a genuine security risk.
ZipCrypto: Broken, Avoid
ZipCrypto (also called Traditional PKZIP Encryption) is the original encryption method from 1989. It uses a three-key stream cipher that was considered adequate in the era of 386 processors. It is not adequate now. Known-plaintext attacks published by Eli Biham and Paul Kocher can crack ZipCrypto in minutes to hours if the attacker knows any file in the archive — and many archives contain predictable content (XML headers, license files, readme templates).
Worse, many tools still default to ZipCrypto when you select "encrypt" or "password protect." Windows' built-in ZIP handler exclusively uses ZipCrypto. If you right-click a folder in Windows Explorer and create an encrypted ZIP, it's using ZipCrypto. Do not use this for anything sensitive.
AES-256: The Secure Option
WinZip introduced AES-256 encryption for ZIP in 2003 (AE-1 and AE-2 extensions to the ZIP specification). AES-256 is the same encryption standard used by governments and financial institutions. With a strong password (12+ characters, mixed), AES-256 encrypted ZIPs are effectively uncrackable.
The catch: not all tools support AES-256 ZIP encryption. 7-Zip, WinRAR, WinZip, Keka (macOS), and Info-ZIP (Linux) all support it. Windows' built-in ZIP handler added extraction support in Windows 11 23H2, but older Windows versions can't open AES-256 encrypted ZIPs without third-party software. If your recipient is on Windows 10, send 7Z (which always uses AES-256) or verify they have 7-Zip installed.
Critical detail: ZIP AES-256 encryption protects file contents but not filenames by default. An attacker can see what files are in the archive (names, sizes, timestamps) without the password. Only 7Z offers optional filename encryption. If the file names themselves are sensitive, use 7Z instead.
Unicode Filename Support and the Encoding Mess
The original ZIP specification encoded filenames in IBM Code Page 437 — a DOS-era character set that doesn't support Chinese, Japanese, Korean, Arabic, or most non-Western-European characters. Archives created on a Japanese Windows system would use Shift-JIS filenames; extracting on an English system produced garbled names.
APPNOTE version 6.3.0 (2006) added Language Encoding Flag (EFS, bit 11), which signals that filenames are encoded in UTF-8. Modern tools (7-Zip, WinRAR, Info-ZIP 3.0+, macOS Archive Utility) set this flag automatically. But older tools and some platform-specific tools still create archives with locale-specific encodings.
If you receive a ZIP with corrupted filenames, the archive was likely created with a non-UTF-8 encoding. 7-Zip lets you specify the codepage during extraction (7z x -mcp=65001 archive.zip for UTF-8, -mcp=932 for Shift-JIS). This is one of those problems that's almost solved — but "almost" means you'll still encounter it.
The macOS __MACOSX Folder Problem
When macOS's built-in Archive Utility creates a ZIP, it includes a __MACOSX folder containing resource fork data and extended attributes for each file (stored as ._filename "AppleDouble" files). These are invisible on macOS but appear as clutter when the ZIP is opened on Windows or Linux.
The __MACOSX folder is harmless — it's metadata, not malware. But it looks unprofessional when sharing ZIPs with Windows users. To create clean ZIPs on macOS, use the command line: zip -r -X archive.zip folder/ (the -X flag excludes extended attributes). Or use Keka, which omits Apple metadata by default.
If you receive a ZIP with __MACOSX clutter, extracting on Windows and re-archiving will strip it. Or use zip -d archive.zip "__MACOSX/*" to delete the metadata in place.
Self-Extracting ZIP Archives
A self-extracting ZIP (SFX) is an executable (.exe on Windows) that contains a ZIP archive plus a small extraction program. The recipient double-clicks the .exe, and files are extracted without needing any archive software. This was valuable in the era before OS-native ZIP support; now it's mostly used for installers and distributing to non-technical users.
7-Zip can create self-extracting ZIPs with a custom configuration (installer title, extraction path, post-extraction commands). WinRAR's SFX module is more polished with GUI options. The downside: antivirus software frequently flags self-extracting archives because they're executables with embedded compressed payloads — exactly what malware looks like. Many email providers block .exe attachments outright. In 2026, the security paranoia around SFX files often outweighs the convenience.
Modern ZIP Extensions: Zstandard and Beyond
The ZIP specification continues to evolve. The most significant recent addition is Zstandard (ZSTD) compression inside ZIP archives, added as compression method 93 in APPNOTE version 6.3.7. Zstandard achieves compression ratios comparable to LZMA while compressing 5-10x faster and decompressing 3-4x faster. It's the best per-file compressor available in 2026.
The problem is support. As of early 2026, only 7-Zip 24.x and a few command-line tools support ZSTD-compressed ZIPs. Windows' built-in handler doesn't. macOS's Archive Utility doesn't. If you create a ZIP with ZSTD compression, the recipient needs 7-Zip to open it — which defeats the universal compatibility that is ZIP's entire value proposition.
For now, use Deflate for ZIPs you'll share with others. Use ZSTD inside ZIP only for internal pipelines where you control both ends. If you need ZSTD compression for general use, TAR.ZST is a more widely supported option on Linux.
When to Use ZIP (and When Not To)
Use ZIP when:
- Sharing files with anyone who might not have 7-Zip installed
- Attaching archives to email (universally recognized)
- Distributing files to non-technical users
- You need random access to individual files in a large archive
- Cross-platform compatibility is essential (Windows + macOS + Linux)
Don't use ZIP when:
- Maximum compression matters — convert to 7Z for 30-70% smaller archives on text-heavy content
- You need to preserve Unix permissions and symlinks — convert to TAR.GZ
- You need encrypted filenames — use 7Z instead
- You're archiving thousands of similar files — solid compression (7Z) is dramatically more efficient
ZIP's strength has never been compression ratio, encryption, or Unix metadata preservation. Its strength is that it works everywhere, for everyone, without explanation. That's worth more than a 30% smaller file size in almost every human-to-human file sharing scenario.
When you need more compression, convert ZIP to 7Z. When you need Unix permissions preserved, convert ZIP to TAR.GZ. When someone sends you an archive you can't open, convert it to ZIP: 7Z to ZIP, RAR to ZIP, TAR.GZ to ZIP. The universal format remains universal for a reason.