TSV is CSV with one character changed — tab instead of comma — and that single change eliminates most of CSV's parsing headaches. Commas appear everywhere in data: addresses ("123 Main St, Apt 4"), descriptions ("fast, reliable, affordable"), European numbers ("3,14"). Every comma inside a value requires quoting and escaping. Tabs almost never appear in natural text, so TSV data rarely needs quoting at all.

This simplicity makes TSV the preferred format in specific domains where CSV's quirks cause real problems: bioinformatics (where gene annotations contain commas and semicolons), data exchange via clipboard (copy cells from Excel, paste into a text editor — that's TSV), and Unix pipelines (where cut -f2 extracts the second column cleanly without a CSV parser).

TSV vs CSV: The Practical Differences

PropertyCSVTSV
DelimiterComma (,)Tab (\t)
Quoting neededFrequently (commas in data, newlines)Rarely (tabs are uncommon in data)
Locale issuesEuropean locales use semicolons as delimiterTabs are locale-independent
StandardRFC 4180 (loosely followed)IANA text/tab-separated-values
File extension.csv.tsv or .tab
MIME typetext/csvtext/tab-separated-values
Clipboard formatNot standardDefault when copying from spreadsheets
Excel behaviorAuto-opens, auto-detects types (dangerous)Opens with Text Import Wizard (slightly safer)
Unix toolsRequires CSV parser for quoted fieldscut, awk, sort work natively

The key insight: CSV's quoting mechanism (double quotes around fields containing delimiters) is a source of endless bugs. Unmatched quotes corrupt entire rows. Escaped quotes inside quoted fields ("He said ""hello""") confuse parsers. Different producers use different quoting conventions. TSV sidesteps all of this by using a delimiter that almost never appears in data.

TSV in Bioinformatics: The Domain Standard

Bioinformatics standardized on tab-delimited formats decades ago, and the reasons illuminate why TSV is technically superior for data interchange:

  • BED format (Browser Extensible Data): Genome region annotations. Tab-separated, no header, minimal: chr1\t1000\t5000\tgene_name.
  • VCF format (Variant Call Format): Genetic variants. Tab-separated with a ## header section. The INFO field contains semicolon-separated key-value pairs — commas inside this field would be catastrophic in CSV.
  • GFF/GTF format (General Feature Format): Gene annotations. Tab-separated with semicolon-separated attributes in the 9th column.
  • BLAST output: Sequence alignment results. Default output format 6 is tab-separated: query_id\tsubject_id\tpct_identity\t...
  • SAM/BAM format: Sequence alignment. The SAM text format is tab-separated with complex fields containing colons and commas.

Notice the pattern: bioinformatics data frequently contains commas and semicolons within field values. Using commas as both delimiters and data would require complex quoting that breaks simple Unix tools. Tabs avoid the collision entirely.

When working with bioinformatics data in spreadsheet tools, convert TSV to XLSX for viewing in Excel or convert TSV to CSV for tools that specifically require comma delimiting.

Clipboard Copy: Spreadsheets Speak TSV

When you select cells in Excel, Google Sheets, or LibreOffice and press Ctrl+C, the clipboard contains TSV — not CSV. Paste into a text editor and you'll see tab-separated values. Paste into another spreadsheet and the tabs align data into columns automatically.

This is why the copy-paste workflow between spreadsheets and text editors is so smooth: TSV is the native interchange format. When you paste tabular data from a web page or email into Excel, Excel splits on tabs to populate columns. When you paste cells from Excel into a web form's text field, you get tab-separated text.

This has practical implications:

  • Quick data entry: Type tab-separated data in a text editor, select all, paste into Excel. Columns align automatically.
  • Quick data extraction: Select cells in Excel, Ctrl+C, paste into a Python script as a multi-line string. Parse with line.split('\t').
  • Cross-application transfer: Copy from Excel, paste into Google Sheets (or vice versa). The tabs ensure column alignment regardless of the application.

Unix Pipeline Friendliness

TSV is the most Unix-friendly tabular format because standard Unix tools handle tab delimiters natively:

# Extract the second column
cut -f2 data.tsv

# Sort by third column (numeric)
sort -t$'\t' -k3 -n data.tsv

# Filter rows where column 2 equals "active"
awk -F'\t' '$2 == "active"' data.tsv

# Join two TSV files on first column
join -t$'\t' <(sort file1.tsv) <(sort file2.tsv)

# Count unique values in column 3
cut -f3 data.tsv | sort | uniq -c | sort -rn

Try doing any of these with a CSV that has quoted fields containing commas. cut -d, -f2 will break on every row where a field contains a comma. You'd need a CSV-aware tool like csvtool or miller. With TSV, the standard tools work out of the box.

This matters for data engineering pipelines that process files with shell scripts, GNU coreutils, and awk. TSV files flow through these tools without any special handling. CSV files require dedicated parsers, adding dependency and complexity.

When TSV Falls Short

TSV isn't perfect. Its limitations are the mirror of its strengths:

  • Tabs in data. If your data contains literal tab characters (rare but possible in free-form text, code snippets, or imported data), TSV breaks just like CSV breaks on commas. The fix: escape or remove tabs before writing. In practice, this is far less common than commas in data.
  • No standard quoting mechanism. CSV has a well-defined (if imperfect) quoting standard: double quotes around fields, doubled double quotes for literal quotes. TSV has no equivalent standard. Some tools support backslash escaping (\t for literal tab, \n for literal newline). Others don't. This ambiguity is TSV's biggest weakness compared to CSV.
  • Invisible delimiter. Tab characters are invisible in most text editors and terminals. You can't tell by looking whether a file is TSV (tabs) or space-aligned (spaces). In CSV, the commas are visible. Use cat -A data.tsv on Unix to see tabs as ^I characters.
  • Less universal recognition. Double-clicking a .tsv file may not open in your spreadsheet application. .csv files are universally associated with spreadsheet apps. You may need to rename .tsv to .csv or use File > Open and specify the delimiter.

Converting Between TSV and Other Formats

TSV conversion is straightforward because the only difference from CSV is the delimiter character:

  • TSV to CSV: Replace tabs with commas, add quoting for fields that contain commas. Convert TSV to CSV handles this automatically. In Python: csv.writer(out, delimiter=',') to write, csv.reader(inp, delimiter='\t') to read.
  • CSV to TSV: Replace commas with tabs, remove quoting (since tabs in data are rare). Convert CSV to TSV or in Unix: python3 -c "import csv,sys; w=csv.writer(sys.stdout, delimiter='\t'); [w.writerow(r) for r in csv.reader(sys.stdin)]"
  • TSV to XLSX: Convert TSV to XLSX for sharing with spreadsheet users. Data types are preserved as well as they would be from CSV (which is to say: not well, because TSV also has no type information).
  • TSV to JSON: Convert TSV to JSON for programmatic consumption. Each row becomes a JSON object with column headers as keys.

For one-off conversions, the Unix command tr '\t' ',' < data.tsv > data.csv works for simple data but doesn't handle quoting. For production use, always use a proper CSV library that handles edge cases.

When to Choose TSV Over CSV

Use TSV when:

  • Your data contains commas in field values. Addresses, descriptions, names with suffixes ("Smith, Jr."), product lists. TSV avoids the quoting circus.
  • You're working in bioinformatics. Most genomics tools expect TSV input and produce TSV output. Using CSV introduces unnecessary conversion steps.
  • Your pipeline uses Unix tools. cut, sort, awk, join work natively with TSV. CSV requires specialized tools.
  • You're exchanging data between spreadsheet applications. Copy-paste uses TSV natively. Saving as TSV preserves the clipboard format.
  • You need locale independence. TSV uses the same delimiter worldwide. CSV's delimiter varies by locale (comma in US/UK, semicolon in France/Germany).

Use CSV when:

  • The recipient expects CSV specifically. Many import tools, SaaS products, and databases have CSV import features but no TSV option.
  • Maximum compatibility is the goal. .csv is more universally recognized than .tsv.
  • You're publishing data for broad consumption. CSV is the default expectation for open data portals, kaggle datasets, and API exports.

TSV is the engineer's choice for tabular data. It solves CSV's most annoying problem (commas in data requiring quoting) by using a delimiter that rarely appears in natural text. The trade-off — less universal recognition and no standard quoting mechanism — is minor for technical workflows where the data is processed by scripts and tools rather than opened by double-clicking in Excel.

The recommendation is simple: if your data contains free-form text (addresses, descriptions, annotations), use TSV. If your data is purely numeric or contains only simple strings without commas, CSV and TSV are equivalent — use whichever your downstream tools expect. And if you're not sure, convert CSV to TSV and see how much simpler your parsing becomes.