TAR doesn't compress anything. That statement confuses people who think of TAR.GZ as a single format, but it's the fundamental design fact: TAR is a file bundler, not a compressor. It takes files and directories, preserves their metadata (permissions, ownership, timestamps, symlinks), and writes them sequentially into a single stream. A separate program — GZIP, BZIP2, XZ, or Zstandard — then compresses that stream.
This two-tool separation is deliberate Unix philosophy: do one thing well. TAR bundles. GZIP compresses. The result is a pipeline where you can swap compression algorithms without changing the archive format. A .tar.gz, .tar.xz, and .tar.zst all contain the same TAR stream — only the compression wrapper changes.
TAR was designed in 1979 for writing to magnetic tape drives. Its sequential, streaming nature made perfect sense for tape. 47 years later, it's still the foundation of Linux package distribution, source code tarballs, Docker image layers, and system backups. Here's why it endures and when to use it over ZIP or 7z.
From Tape Drives to Linux Kernels
TAR — Tape ARchive — was created at Bell Labs for Unix Version 7 in 1979. It replaced the older tp utility and was designed to write file trees sequentially to magnetic tape. Tape drives read and write sequentially (no random access), so TAR's stream format was the natural fit: write file header, write file data, write next file header, write next file data, repeat.
The original TAR format stored filenames up to 100 characters with no support for file sizes over 8GB. Two major extensions appeared: POSIX.1-1988 ("ustar") extended the filename limit to 256 characters and added user/group names. POSIX.1-2001 ("pax") added extended headers with virtually unlimited filename lengths, large file support, and UTF-8 encoding. GNU tar added its own extensions (long filenames, incremental backups, sparse file support) that predate and partially overlap with pax.
Today, virtually every Linux distribution distributes its kernel source as a tarball (linux-6.x.tar.xz), every software project on GitHub auto-generates .tar.gz release archives, and system administrators use TAR for daily backups. The format's longevity comes from its simplicity and its perfect preservation of Unix filesystem semantics.
TAR Is Not Compression (And That's a Feature)
A raw .tar file is the exact same size as its contents plus a few kilobytes of headers. There is zero compression. The file is a simple concatenation of headers and file data blocks, padded to 512-byte boundaries.
This separation of bundling and compression is a feature, not a limitation:
- Algorithm independence: You choose the compression algorithm separately. Today's best choice (Zstandard) didn't exist when TAR was created. Tomorrow's best choice hasn't been invented yet. The TAR format doesn't care — it's just a stream that any compressor can wrap.
- Streaming-friendly: You can pipe TAR output directly to a compressor:
tar cf - directory/ | zstd > archive.tar.zst. No intermediate file needed. The pipeline processes data incrementally, so you can archive terabyte-scale directories without needing terabytes of temporary space. - Simplicity: The format specification is small enough to implement from scratch in a weekend. This makes TAR the most widely supported archive format across programming languages and platforms.
Preserving Unix Permissions, Ownership, and Symlinks
This is TAR's killer feature and the primary reason it's used instead of ZIP on Unix systems. TAR stores:
- File permissions: The full permission bitmask (rwxr-xr-x, setuid, setgid, sticky bit). When you extract a TAR archive, executable scripts remain executable. Server configuration files retain their restrictive permissions.
- User and group ownership: Both numeric UID/GID and textual username/groupname. When restoring a backup, files are reassigned to the correct owner (requires root privileges for extraction).
- Symbolic links: Stored as metadata pointing to the target path, not as copies of the target file. A 50MB shared library symlinked from three locations is stored once with three symlink entries, not four 50MB copies.
- Hard links: TAR detects when multiple directory entries point to the same inode and stores the data once with hard link references. This preserves the filesystem's hard link structure on extraction.
- Timestamps: Modification time (mtime), and with pax extensions, access time (atime) and creation time (ctime) at nanosecond resolution.
- Extended attributes: pax TAR format supports xattrs, SELinux labels, and ACLs — critical for system backups where security contexts matter.
ZIP stores some of this information in its "external attributes" field, but handling is inconsistent across tools and platforms. Extract a ZIP containing symlinks and executables on Linux, and you may find broken symlinks and non-executable scripts. TAR gets this right because it was designed for exactly this filesystem.
Pairing TAR with Compression: GZ, BZ2, XZ, ZST
TAR itself is just bytes. The compression wrapper determines the file size, speed, and tool requirements:
| Extension | Compressor | tar flag | Ratio (1GB text) | Compress Speed | Decompress Speed | Support |
|---|---|---|---|---|---|---|
| .tar.gz / .tgz | GZIP | -z | ~90MB | ~120 MB/s | ~350 MB/s | Universal |
| .tar.bz2 / .tbz2 | BZIP2 | -j | ~75MB | ~25 MB/s | ~80 MB/s | Universal |
| .tar.xz / .txz | XZ | -J | ~55MB | ~15 MB/s | ~200 MB/s | Universal on Linux, needs xz-utils elsewhere |
| .tar.zst | Zstandard | --zstd | ~58MB | ~400 MB/s | ~700 MB/s | Linux (modern), needs zstd elsewhere |
Recommendation: Use TAR.GZ for maximum compatibility. Use TAR.XZ when the smallest possible file matters (distribution packages, release tarballs). Use TAR.ZST for the best balance of speed and ratio — it compresses 25x faster than XZ at a similar ratio. Use TAR.BZ2 only if a recipient specifically requires it; XZ is better in every dimension.
Convert between these: TAR.GZ to TAR.XZ, TAR.GZ to TAR.BZ2, TAR.BZ2 to TAR.GZ, TAR.XZ to TAR.GZ.
Sequential Access: TAR's Biggest Limitation
TAR is a stream format with no central directory. To find a specific file, you read from the beginning until you reach it. To list all files, you read the entire archive. Combined with compression (which requires sequential decompression), this means:
- Listing files in a .tar.gz requires decompressing the entire file — proportional to uncompressed size, not compressed size.
- Extracting one file from the middle requires decompressing everything before it.
- Appending files requires reading to the end of the archive (and with compression, rewriting the entire compressed stream).
ZIP has a central directory at the end of the file, enabling O(1) access to any file by its offset. For archives where you frequently extract individual files, ZIP is structurally superior. TAR is best for create-once, extract-all workloads: backups, software distribution, and data transfer.
GNU tar vs BSD tar: Platform Differences
Linux uses GNU tar. macOS uses BSD tar (libarchive-based since macOS 10.6). They're mostly compatible but have notable differences:
- Long filename handling: GNU tar stores long filenames with its own extension headers. BSD tar uses pax extended headers. Both can read each other's output, but edge cases exist with very long paths (>256 characters).
- macOS resource forks: BSD tar on macOS stores extended attributes and resource forks using AppleDouble format (._filename entries). GNU tar on Linux sees these as regular files. This is analogous to ZIP's __MACOSX folder problem — extracting a macOS tar on Linux creates ._dotfiles.
- Flag syntax: GNU tar uses
--xz,--zstdfor compression. BSD tar auto-detects compression on extraction but may not support all algorithms. On macOS,tar -xJf(XZ) requires xz-utils to be installed via Homebrew. - Sparse file support: GNU tar has built-in sparse file detection (--sparse). BSD tar handles sparse files differently. If you archive a 100GB virtual disk image with 5GB of actual data, GNU tar with --sparse creates a 5GB archive. Without it, you get 100GB.
Essential TAR Commands
TAR's command syntax is notoriously quirky (it predates GNU getopt conventions), but these six commands cover 95% of use cases:
# Create a .tar.gz archive
tar -czf archive.tar.gz directory/
# Create a .tar.xz archive (best compression)
tar -cJf archive.tar.xz directory/
# Create with Zstandard (fastest good compression)
tar --zstd -cf archive.tar.zst directory/
# Extract any compressed tar (auto-detected)
tar -xf archive.tar.gz
tar -xf archive.tar.xz
tar -xf archive.tar.zst
# List contents without extracting
tar -tf archive.tar.gz
# Extract a single file
tar -xf archive.tar.gz path/to/specific/file.txtThe flags: c = create, x = extract, t = list, f = file (next argument is the filename), z = gzip, j = bzip2, J = xz. Modern GNU tar auto-detects compression on extraction, so tar -xf works for any compressed tarball.
TAR exists because Unix needed a format that preserves the filesystem faithfully. Permissions, ownership, symlinks, hard links, sparse files, extended attributes — TAR captures all of it. ZIP, 7z, and RAR were designed for a simpler problem (compress and bundle files for transfer) and don't handle Unix filesystem semantics reliably.
If you're working on Linux or macOS, TAR is the native choice. If you need to share with Windows users, convert TAR.GZ to ZIP or TAR.GZ to 7Z. For Windows users who received a tarball, convert TAR to ZIP here — no software installation needed.