TAR: The Unix Tape Archive That Runs Linux

Published Mar 19, 2026 6 min read By ChangeThisFile Team

Quick Answer

TAR (Tape Archive) is a Unix bundling format from 1979 that combines multiple files into a single stream without compressing them. It preserves Unix permissions, ownership, symlinks, and timestamps. Paired with GZIP, XZ, or Zstandard for compression, TAR is the standard archive format for Linux software distribution and system backups.

TAR doesn't compress anything. That statement confuses people who think of TAR.GZ as a single format, but it's the fundamental design fact: TAR is a file bundler, not a compressor. It takes files and directories, preserves their metadata (permissions, ownership, timestamps, symlinks), and writes them sequentially into a single stream. A separate program — GZIP, BZIP2, XZ, or Zstandard — then compresses that stream.

This two-tool separation is deliberate Unix philosophy: do one thing well. TAR bundles. GZIP compresses. The result is a pipeline where you can swap compression algorithms without changing the archive format. A .tar.gz, .tar.xz, and .tar.zst all contain the same TAR stream — only the compression wrapper changes.

TAR was designed in 1979 for writing to magnetic tape drives. Its sequential, streaming nature made perfect sense for tape. 47 years later, it's still the foundation of Linux package distribution, source code tarballs, Docker image layers, and system backups. Here's why it endures and when to use it over ZIP or 7z.

From Tape Drives to Linux Kernels

TAR — Tape ARchive — was created at Bell Labs for Unix Version 7 in 1979. It replaced the older tp utility and was designed to write file trees sequentially to magnetic tape. Tape drives read and write sequentially (no random access), so TAR's stream format was the natural fit: write file header, write file data, write next file header, write next file data, repeat.

The original TAR format stored filenames up to 100 characters with no support for file sizes over 8GB. Two major extensions appeared: POSIX.1-1988 ("ustar") extended the filename limit to 256 characters and added user/group names. POSIX.1-2001 ("pax") added extended headers with virtually unlimited filename lengths, large file support, and UTF-8 encoding. GNU tar added its own extensions (long filenames, incremental backups, sparse file support) that predate and partially overlap with pax.

Today, virtually every Linux distribution distributes its kernel source as a tarball (linux-6.x.tar.xz), every software project on GitHub auto-generates .tar.gz release archives, and system administrators use TAR for daily backups. The format's longevity comes from its simplicity and its perfect preservation of Unix filesystem semantics.

TAR Is Not Compression (And That's a Feature)

A raw .tar file is the exact same size as its contents plus a few kilobytes of headers. There is zero compression. The file is a simple concatenation of headers and file data blocks, padded to 512-byte boundaries.

This separation of bundling and compression is a feature, not a limitation:

Algorithm independence: You choose the compression algorithm separately. Today's best choice (Zstandard) didn't exist when TAR was created. Tomorrow's best choice hasn't been invented yet. The TAR format doesn't care — it's just a stream that any compressor can wrap.
Streaming-friendly: You can pipe TAR output directly to a compressor: tar cf - directory/ | zstd > archive.tar.zst. No intermediate file needed. The pipeline processes data incrementally, so you can archive terabyte-scale directories without needing terabytes of temporary space.
Simplicity: The format specification is small enough to implement from scratch in a weekend. This makes TAR the most widely supported archive format across programming languages and platforms.

Preserving Unix Permissions, Ownership, and Symlinks

This is TAR's killer feature and the primary reason it's used instead of ZIP on Unix systems. TAR stores:

File permissions: The full permission bitmask (rwxr-xr-x, setuid, setgid, sticky bit). When you extract a TAR archive, executable scripts remain executable. Server configuration files retain their restrictive permissions.
User and group ownership: Both numeric UID/GID and textual username/groupname. When restoring a backup, files are reassigned to the correct owner (requires root privileges for extraction).
Symbolic links: Stored as metadata pointing to the target path, not as copies of the target file. A 50MB shared library symlinked from three locations is stored once with three symlink entries, not four 50MB copies.
Hard links: TAR detects when multiple directory entries point to the same inode and stores the data once with hard link references. This preserves the filesystem's hard link structure on extraction.
Timestamps: Modification time (mtime), and with pax extensions, access time (atime) and creation time (ctime) at nanosecond resolution.
Extended attributes: pax TAR format supports xattrs, SELinux labels, and ACLs — critical for system backups where security contexts matter.

ZIP stores some of this information in its "external attributes" field, but handling is inconsistent across tools and platforms. Extract a ZIP containing symlinks and executables on Linux, and you may find broken symlinks and non-executable scripts. TAR gets this right because it was designed for exactly this filesystem.

Pairing TAR with Compression: GZ, BZ2, XZ, ZST

TAR itself is just bytes. The compression wrapper determines the file size, speed, and tool requirements:

Extension	Compressor	tar flag	Ratio (1GB text)	Compress Speed	Decompress Speed	Support
.tar.gz / .tgz	GZIP	-z	~90MB	~120 MB/s	~350 MB/s	Universal
.tar.bz2 / .tbz2	BZIP2	-j	~75MB	~25 MB/s	~80 MB/s	Universal
.tar.xz / .txz	XZ	-J	~55MB	~15 MB/s	~200 MB/s	Universal on Linux, needs xz-utils elsewhere
.tar.zst	Zstandard	--zstd	~58MB	~400 MB/s	~700 MB/s	Linux (modern), needs zstd elsewhere

Recommendation: Use TAR.GZ for maximum compatibility. Use TAR.XZ when the smallest possible file matters (distribution packages, release tarballs). Use TAR.ZST for the best balance of speed and ratio — it compresses 25x faster than XZ at a similar ratio. Use TAR.BZ2 only if a recipient specifically requires it; XZ is better in every dimension.

Convert between these: TAR.GZ to TAR.XZ, TAR.GZ to TAR.BZ2, TAR.BZ2 to TAR.GZ, TAR.XZ to TAR.GZ.

Sequential Access: TAR's Biggest Limitation

TAR is a stream format with no central directory. To find a specific file, you read from the beginning until you reach it. To list all files, you read the entire archive. Combined with compression (which requires sequential decompression), this means:

Listing files in a .tar.gz requires decompressing the entire file — proportional to uncompressed size, not compressed size.
Extracting one file from the middle requires decompressing everything before it.
Appending files requires reading to the end of the archive (and with compression, rewriting the entire compressed stream).

ZIP has a central directory at the end of the file, enabling O(1) access to any file by its offset. For archives where you frequently extract individual files, ZIP is structurally superior. TAR is best for create-once, extract-all workloads: backups, software distribution, and data transfer.

GNU tar vs BSD tar: Platform Differences

Linux uses GNU tar. macOS uses BSD tar (libarchive-based since macOS 10.6). They're mostly compatible but have notable differences:

Long filename handling: GNU tar stores long filenames with its own extension headers. BSD tar uses pax extended headers. Both can read each other's output, but edge cases exist with very long paths (>256 characters).
macOS resource forks: BSD tar on macOS stores extended attributes and resource forks using AppleDouble format (._filename entries). GNU tar on Linux sees these as regular files. This is analogous to ZIP's __MACOSX folder problem — extracting a macOS tar on Linux creates ._dotfiles.
Flag syntax: GNU tar uses --xz, --zstd for compression. BSD tar auto-detects compression on extraction but may not support all algorithms. On macOS, tar -xJf (XZ) requires xz-utils to be installed via Homebrew.
Sparse file support: GNU tar has built-in sparse file detection (--sparse). BSD tar handles sparse files differently. If you archive a 100GB virtual disk image with 5GB of actual data, GNU tar with --sparse creates a 5GB archive. Without it, you get 100GB.

Essential TAR Commands

TAR's command syntax is notoriously quirky (it predates GNU getopt conventions), but these six commands cover 95% of use cases:

# Create a .tar.gz archive
tar -czf archive.tar.gz directory/

# Create a .tar.xz archive (best compression)
tar -cJf archive.tar.xz directory/

# Create with Zstandard (fastest good compression)
tar --zstd -cf archive.tar.zst directory/

# Extract any compressed tar (auto-detected)
tar -xf archive.tar.gz
tar -xf archive.tar.xz
tar -xf archive.tar.zst

# List contents without extracting
tar -tf archive.tar.gz

# Extract a single file
tar -xf archive.tar.gz path/to/specific/file.txt

The flags: c = create, x = extract, t = list, f = file (next argument is the filename), z = gzip, j = bzip2, J = xz. Modern GNU tar auto-detects compression on extraction, so tar -xf works for any compressed tarball.

TAR exists because Unix needed a format that preserves the filesystem faithfully. Permissions, ownership, symlinks, hard links, sparse files, extended attributes — TAR captures all of it. ZIP, 7z, and RAR were designed for a simpler problem (compress and bundle files for transfer) and don't handle Unix filesystem semantics reliably.

If you're working on Linux or macOS, TAR is the native choice. If you need to share with Windows users, convert TAR.GZ to ZIP or TAR.GZ to 7Z. For Windows users who received a tarball, convert TAR to ZIP here — no software installation needed.

Key Takeaways

TAR bundles files without compressing — it's a container format that pairs with GZIP, XZ, or Zstandard for compression
TAR faithfully preserves Unix permissions, ownership, symlinks, hard links, and extended attributes — ZIP does not
Sequential access only: no central directory means listing or extracting single files requires reading from the beginning
TAR.GZ is the universal default; TAR.XZ gives best compression; TAR.ZST is the fast modern option; TAR.BZ2 is obsolete
GNU tar (Linux) and BSD tar (macOS) are mostly compatible but handle long filenames and resource forks differently
Every Linux kernel release, GitHub source archive, Docker image layer, and npm package uses TAR internally

Frequently Asked Questions

Does TAR compress files?

No. TAR only bundles files into a single stream — the output .tar file is the same size as the input files plus a few kilobytes of headers. Compression is handled by a separate tool: GZIP (.tar.gz), BZIP2 (.tar.bz2), XZ (.tar.xz), or Zstandard (.tar.zst). This separation lets you choose the best compression algorithm for your needs without changing the archive format.

What's the difference between .tar.gz and .tgz?

Nothing — they're the same format. .tgz is a shorthand for .tar.gz, commonly used when filenames need to fit 8.3 naming conventions (DOS/FAT) or when brevity matters. Similarly, .tbz2 is .tar.bz2 and .txz is .tar.xz. The file contents are identical regardless of which extension you use.

Why do Linux distributions use TAR instead of ZIP?

TAR preserves Unix filesystem metadata that ZIP handles unreliably: file permissions (rwxr-xr-x, setuid, setgid), user/group ownership, symbolic links, hard links, and extended attributes (SELinux labels, ACLs). When distributing software with executable scripts, symlinked libraries, and specific permission bits, TAR guarantees they survive the archive/extract cycle. ZIP may lose or mangle this metadata depending on the tool and platform.

Can I extract a single file from a .tar.gz without extracting everything?

You can target a specific file with tar -xf archive.tar.gz path/to/file.txt, but it's not fast. TAR is a stream format — to reach file #500, the tool must read and decompress everything before it. With GZIP compression, this means decompressing most of the archive. If you frequently need individual file access, ZIP is structurally better for that use case.

Which is better: TAR.GZ or TAR.XZ?

TAR.GZ is faster to create (8x faster than XZ) and universally supported. TAR.XZ produces 30-40% smaller archives but is very slow to compress. For distribution (where you compress once, decompress many times), TAR.XZ saves bandwidth. For backups (where compression speed matters), TAR.GZ or TAR.ZST is better. The Linux kernel uses TAR.XZ for releases because the bandwidth savings at scale outweigh the slow compression.

What is a tarball?

A tarball is informal terminology for a TAR archive — any .tar, .tar.gz, .tar.xz, or .tar.bz2 file. The term comes from the fact that TAR 'rolls' files together like balls of tar. In open-source communities, 'downloading the tarball' means downloading the source code archive (typically .tar.gz or .tar.xz) as opposed to cloning the git repository.

How does TAR handle symbolic links?

TAR stores symbolic links as metadata entries containing the target path — not as copies of the target file. A symlink to a 100MB file takes only a few hundred bytes in the archive. On extraction, TAR recreates the symlink pointing to the same target path. If the target doesn't exist at extraction time, you get a dangling symlink. This is faithful Unix semantics. ZIP implementations vary wildly on symlink handling — some follow them (archiving the target), some skip them, some create Windows shortcuts.

Can Windows open TAR files?

Windows 11 (build 22000+) can extract .tar, .tar.gz, and .tar.bz2 via the built-in tar command and File Explorer. For TAR.XZ and TAR.ZST, you need 7-Zip (free) or WinRAR. If you need to send a tarball to a Windows user who might not have these tools, convert it to ZIP first: ChangeThisFile supports TAR.GZ to ZIP, TAR.BZ2 to ZIP, and TAR.XZ to ZIP conversions.

Ready to convert your files?

Use ChangeThisFile to convert between 600+ formats — free, fast, and private.

Start Converting