ZIP-to-TAR is a re-archiving operation, not a true conversion. The contents are identical; only the wrapper format changes. Python's standard library covers this completely — no third-party install needed. The interesting parts are streaming for big files, choosing TAR compression (none, gzip, bzip2, xz), and preserving file metadata.

Method 1: zipfile + tarfile (stdlib, no dependencies)

Both modules ship with Python. The pattern: extract to a temp dir, repack as TAR. For small archives, this is fine.

import zipfile
import tarfile
import tempfile
from pathlib import Path

def zip_to_tar(in_path: str, out_path: str, compression: str = "gz") -> None:
    """
    compression: '' (uncompressed .tar), 'gz' (.tar.gz), 'bz2' (.tar.bz2), 'xz' (.tar.xz)
    """
    mode_map = {"": "w", "gz": "w:gz", "bz2": "w:bz2", "xz": "w:xz"}
    mode = mode_map[compression]

    with tempfile.TemporaryDirectory() as tmp_dir:
        # Extract ZIP
        with zipfile.ZipFile(in_path) as zf:
            zf.extractall(tmp_dir)
        # Repack as TAR
        tmp_path = Path(tmp_dir)
        with tarfile.open(out_path, mode) as tf:
            for item in tmp_path.rglob("*"):
                arcname = str(item.relative_to(tmp_path))
                tf.add(item, arcname=arcname)

zip_to_tar("archive.zip", "archive.tar.gz")
zip_to_tar("archive.zip", "archive.tar.xz", compression="xz")  # smaller, slower
zip_to_tar("archive.zip", "archive.tar", compression="")    # uncompressed

Compression choices:

  • uncompressed (.tar) — fastest, biggest. Good for already-compressed contents (mp4, jpg, mp3).
  • gzip (.tar.gz / .tgz) — universal, fast, decent compression. Default choice.
  • bzip2 (.tar.bz2) — better compression than gzip, ~3x slower.
  • xz (.tar.xz) — best compression, ~10x slower than gzip. Good for distribution archives where size matters more than speed.

Method 2: streaming for large archives (memory-safe)

The temp-dir approach loads everything to disk. For multi-gigabyte ZIPs, that may not fit. Stream entries directly from ZIP to TAR:

import zipfile
import tarfile
import time

def zip_to_tar_streaming(in_path: str, out_path: str, compression: str = "gz") -> None:
    mode_map = {"": "w", "gz": "w:gz", "bz2": "w:bz2", "xz": "w:xz"}

    with zipfile.ZipFile(in_path) as zf, tarfile.open(out_path, mode_map[compression]) as tf:
        for info in zf.infolist():
            if info.is_dir():
                continue
            with zf.open(info) as src:
                tarinfo = tarfile.TarInfo(name=info.filename)
                tarinfo.size = info.file_size
                tarinfo.mtime = time.mktime(info.date_time + (0, 0, -1))
                tarinfo.mode = 0o644
                tf.addfile(tarinfo, src)

zip_to_tar_streaming("big_archive.zip", "big_archive.tar.gz")

This streams each file from the ZIP directly into the TAR — peak memory is one file's worth, not the whole archive's worth. Required for archives bigger than ~half your free RAM.

Method 3: ChangeThisFile API (handles edge cases)

For ZIPs that might be password-protected, ZIP64 (>4GB), or contain unusual entries (symlinks, special characters in names), the API uses 7-Zip server-side which handles edge cases the stdlib doesn't. Get a free API key.

import requests

API_KEY = "ctf_sk_your_key_here"

def zip_to_tar(in_path: str, out_path: str) -> None:
    with open(in_path, "rb") as f:
        response = requests.post(
            "https://changethisfile.com/v1/convert",
            headers={"Authorization": f"Bearer {API_KEY}"},
            files={"file": f},
            data={"source": "zip", "target": "tar"},
            timeout=120,
        )
    response.raise_for_status()
    with open(out_path, "wb") as out:
        out.write(response.content)

zip_to_tar("archive.zip", "archive.tar")

For tar.gz output, set target=tar.gz. The API also supports tar.bz2 and tar.xz.

When to use each

ApproachBest forTradeoff
stdlib (extract + repack)Default for small/medium archives, no depsLoads everything to disk; slow on huge archives
StreamingLarge archives, memory-constrained environmentsMore verbose; loses some metadata
ChangeThisFile APIPassword-protected ZIPs, ZIP64, edge casesFile size limit (25MB free), per-call cost

CLI alternative: 7-Zip or unzip + tar

For shell pipelines:

# 7-Zip (handles every archive format):
apt install p7zip-full
mkdir tmp_extract
7z x archive.zip -otmp_extract
tar -czf archive.tar.gz -C tmp_extract .
rm -rf tmp_extract

# Or with unzip + tar:
mkdir tmp_extract && cd tmp_extract
unzip ../archive.zip
tar -czf ../archive.tar.gz .
cd .. && rm -rf tmp_extract

7-Zip handles password-protected archives (-p flag), ZIP64, and unusual entry types. unzip is the simplest tool but chokes on many ZIP variants. Use 7-Zip when in doubt.

Common pitfalls

  • ZIP timestamps lose precision. ZIP stores mtime to 2-second granularity. Conversion preserves only what's in the source — don't expect TAR's microsecond precision to be filled in.
  • Symlinks don't survive. ZIP doesn't natively store symlinks (some implementations use special metadata, but it's non-standard). The stdlib will materialize symlinks as regular files; the API preserves them when present.
  • Password-protected ZIPs fail silently. zipfile.ZipFile.extractall raises on encrypted entries. Pass pwd=b'password' if you know it; otherwise switch to the API or 7-Zip CLI.
  • Filename encoding. Old Windows ZIPs use cp437 or local codepage for non-ASCII filenames. Pass metadata_encoding='utf-8' or 'cp437' to ZipFile depending on the source.
  • Permissions. ZIP doesn't store Unix permissions reliably. The streaming method sets a default 0o644 for all files; for executables, you'd need to detect and override.

For most jobs, the stdlib pattern is clean and fast. Stream for huge archives. Use the API when ZIPs come from unpredictable sources. Free tier is 100 conversions/month, no card.