Why is my .tar.gz larger than the source ZIP?

ZIP and gzip use the same DEFLATE algorithm. If the source ZIP is heavily compressed (max compression on text files), gzip can't compress further — and tar adds a small per-file overhead. The TAR overhead is usually 0.1-1% — if your TAR is much bigger, something else is going on.

Can I convert without recompressing?

No — TAR doesn't have an internal compression scheme like ZIP does. To go from ZIP (compressed) to TAR-uncompressed, the data is decompressed. To go from ZIP to TAR.GZ, it's decompressed then recompressed.

How do I preserve file dates from the original ZIP?

The streaming method sets tarinfo.mtime explicitly. The temp-dir method preserves mtimes via tarfile.add() because it reads them from the extracted files.

Does the API handle password-protected ZIPs?

Currently the API rejects password-protected ZIPs with a clear error. Decrypt locally first (with 7-Zip CLI: 7z x -p$PASSWORD), then upload the unprotected version.

What's the file size limit on the API?

Free tier: 25MB upload. Pro tier: 500MB. For larger archives, do the conversion locally with the streaming method.

Can I convert TAR to ZIP with the same code?

Yes — swap the modules: open with tarfile, write with zipfile. Or use the API with source=tar, target=zip.

How to Convert ZIP to TAR in Python (3 Methods + API)

ZIP-to-TAR is a re-archiving operation, not a true conversion. The contents are identical; only the wrapper format changes. Python's standard library covers this completely — no third-party install needed. The interesting parts are streaming for big files, choosing TAR compression (none, gzip, bzip2, xz), and preserving file metadata.

Method 1: zipfile + tarfile (stdlib, no dependencies)

Both modules ship with Python. The pattern: extract to a temp dir, repack as TAR. For small archives, this is fine.

import zipfile
import tarfile
import tempfile
from pathlib import Path

def zip_to_tar(in_path: str, out_path: str, compression: str = "gz") -> None:
    """
    compression: '' (uncompressed .tar), 'gz' (.tar.gz), 'bz2' (.tar.bz2), 'xz' (.tar.xz)
    """
    mode_map = {"": "w", "gz": "w:gz", "bz2": "w:bz2", "xz": "w:xz"}
    mode = mode_map[compression]

    with tempfile.TemporaryDirectory() as tmp_dir:
        # Extract ZIP
        with zipfile.ZipFile(in_path) as zf:
            zf.extractall(tmp_dir)
        # Repack as TAR
        tmp_path = Path(tmp_dir)
        with tarfile.open(out_path, mode) as tf:
            for item in tmp_path.rglob("*"):
                arcname = str(item.relative_to(tmp_path))
                tf.add(item, arcname=arcname)

zip_to_tar("archive.zip", "archive.tar.gz")
zip_to_tar("archive.zip", "archive.tar.xz", compression="xz")  # smaller, slower
zip_to_tar("archive.zip", "archive.tar", compression="")    # uncompressed

Compression choices:

uncompressed (.tar) — fastest, biggest. Good for already-compressed contents (mp4, jpg, mp3).
gzip (.tar.gz / .tgz) — universal, fast, decent compression. Default choice.
bzip2 (.tar.bz2) — better compression than gzip, ~3x slower.
xz (.tar.xz) — best compression, ~10x slower than gzip. Good for distribution archives where size matters more than speed.

Method 2: streaming for large archives (memory-safe)

The temp-dir approach loads everything to disk. For multi-gigabyte ZIPs, that may not fit. Stream entries directly from ZIP to TAR:

import zipfile
import tarfile
import time

def zip_to_tar_streaming(in_path: str, out_path: str, compression: str = "gz") -> None:
    mode_map = {"": "w", "gz": "w:gz", "bz2": "w:bz2", "xz": "w:xz"}

    with zipfile.ZipFile(in_path) as zf, tarfile.open(out_path, mode_map[compression]) as tf:
        for info in zf.infolist():
            if info.is_dir():
                continue
            with zf.open(info) as src:
                tarinfo = tarfile.TarInfo(name=info.filename)
                tarinfo.size = info.file_size
                tarinfo.mtime = time.mktime(info.date_time + (0, 0, -1))
                tarinfo.mode = 0o644
                tf.addfile(tarinfo, src)

zip_to_tar_streaming("big_archive.zip", "big_archive.tar.gz")

This streams each file from the ZIP directly into the TAR — peak memory is one file's worth, not the whole archive's worth. Required for archives bigger than ~half your free RAM.

Method 3: ChangeThisFile API (handles edge cases)

For ZIPs that might be password-protected, ZIP64 (>4GB), or contain unusual entries (symlinks, special characters in names), the API uses 7-Zip server-side which handles edge cases the stdlib doesn't. Get a free API key.

import requests

API_KEY = "ctf_sk_your_key_here"

def zip_to_tar(in_path: str, out_path: str) -> None:
    with open(in_path, "rb") as f:
        response = requests.post(
            "https://changethisfile.com/v1/convert",
            headers={"Authorization": f"Bearer {API_KEY}"},
            files={"file": f},
            data={"source": "zip", "target": "tar"},
            timeout=120,
        )
    response.raise_for_status()
    with open(out_path, "wb") as out:
        out.write(response.content)

zip_to_tar("archive.zip", "archive.tar")

For tar.gz output, set target=tar.gz. The API also supports tar.bz2 and tar.xz.

When to use each

Approach	Best for	Tradeoff
stdlib (extract + repack)	Default for small/medium archives, no deps	Loads everything to disk; slow on huge archives
Streaming	Large archives, memory-constrained environments	More verbose; loses some metadata
ChangeThisFile API	Password-protected ZIPs, ZIP64, edge cases	File size limit (25MB free), per-call cost

CLI alternative: 7-Zip or unzip + tar

For shell pipelines:

# 7-Zip (handles every archive format):
apt install p7zip-full
mkdir tmp_extract
7z x archive.zip -otmp_extract
tar -czf archive.tar.gz -C tmp_extract .
rm -rf tmp_extract

# Or with unzip + tar:
mkdir tmp_extract && cd tmp_extract
unzip ../archive.zip
tar -czf ../archive.tar.gz .
cd .. && rm -rf tmp_extract

7-Zip handles password-protected archives (-p flag), ZIP64, and unusual entry types. unzip is the simplest tool but chokes on many ZIP variants. Use 7-Zip when in doubt.

Common pitfalls

ZIP timestamps lose precision. ZIP stores mtime to 2-second granularity. Conversion preserves only what's in the source — don't expect TAR's microsecond precision to be filled in.
Symlinks don't survive. ZIP doesn't natively store symlinks (some implementations use special metadata, but it's non-standard). The stdlib will materialize symlinks as regular files; the API preserves them when present.
Password-protected ZIPs fail silently. zipfile.ZipFile.extractall raises on encrypted entries. Pass pwd=b'password' if you know it; otherwise switch to the API or 7-Zip CLI.
Filename encoding. Old Windows ZIPs use cp437 or local codepage for non-ASCII filenames. Pass metadata_encoding='utf-8' or 'cp437' to ZipFile depending on the source.
Permissions. ZIP doesn't store Unix permissions reliably. The streaming method sets a default 0o644 for all files; for executables, you'd need to detect and override.

For most jobs, the stdlib pattern is clean and fast. Stream for huge archives. Use the API when ZIPs come from unpredictable sources. Free tier is 100 conversions/month, no card.

How to Convert ZIP to TAR in Python