Excel workbooks are messy. They can have multiple sheets, formulas that reference other cells, formatting that affects displayed values, hidden rows, and embedded charts. CSV is flat — one sheet, one delimiter, no formatting. The conversion has to make decisions about which sheet, whether to evaluate formulas, and how to render dates and numbers.
Method 1: pandas (the default)
pandas handles XLSX via openpyxl (or xlrd for old .xls). One line for simple cases, parameters for complex ones.
pip install pandas openpyxl
import pandas as pd
def xlsx_to_csv(xlsx_path: str, csv_path: str, sheet_name: str | int = 0) -> None:
df = pd.read_excel(xlsx_path, sheet_name=sheet_name)
df.to_csv(csv_path, index=False, encoding="utf-8")
xlsx_to_csv("sales.xlsx", "sales.csv")
For multi-sheet workbooks, get all sheets and write each to a separate CSV:
def xlsx_to_csvs(xlsx_path: str, out_dir: str) -> None:
sheets = pd.read_excel(xlsx_path, sheet_name=None) # dict of {name: df}
for name, df in sheets.items():
safe_name = name.replace("/", "_").replace(" ", "_")
df.to_csv(f"{out_dir}/{safe_name}.csv", index=False)
xlsx_to_csvs("workbook.xlsx", "sheets/")
Three things to watch for:
- index=False — without this, pandas writes the row index as the first column, which is rarely what you want.
- Date formatting — Excel dates come through as datetime64. df.to_csv handles them but the output format is ISO 8601. Pass date_format=... to customize.
- Formulas — pandas reads cached values, not formula text. If the workbook has not been opened/saved in Excel recently, cached values may be stale.
Method 2: openpyxl read-only mode (for huge files)
For workbooks that don't fit in memory, openpyxl's read_only mode iterates rows without loading the whole sheet.
pip install openpyxl
import csv
from openpyxl import load_workbook
def xlsx_to_csv(xlsx_path: str, csv_path: str, sheet_name: str | None = None) -> None:
wb = load_workbook(xlsx_path, read_only=True, data_only=True)
sheet = wb[sheet_name] if sheet_name else wb.active
with open(csv_path, "w", newline="", encoding="utf-8") as f:
writer = csv.writer(f)
for row in sheet.iter_rows(values_only=True):
writer.writerow(row)
wb.close()
xlsx_to_csv("big_workbook.xlsx", "big_workbook.csv")
data_only=True tells openpyxl to read cached cell values instead of formula text. read_only=True uses the streaming reader — much lower memory footprint than the default.
Method 3: ChangeThisFile API (handles messy input)
For user-uploaded XLSX where you can't predict the structure, the API normalizes it. Get a free API key.
import requests
API_KEY = "sk_test_your_key_here"
def xlsx_to_csv(xlsx_path: str, csv_path: str) -> None:
with open(xlsx_path, "rb") as f:
response = requests.post(
"https://changethisfile.com/v1/convert",
headers={"Authorization": f"Bearer {API_KEY}"},
files={"file": f},
data={"source": "xlsx", "target": "csv"},
timeout=60,
)
response.raise_for_status()
with open(csv_path, "wb") as out:
out.write(response.content)
xlsx_to_csv("user_upload.xlsx", "normalized.csv")
The API converts the active (first) sheet by default. For multi-sheet handling, use target=zip and you get a ZIP of one CSV per sheet.
When to use each
| Approach | Best for | Tradeoff |
|---|---|---|
| pandas | Data analysis, type-aware output, multi-sheet | Loads workbook into memory |
| openpyxl read_only | Workbooks too big for RAM | More boilerplate, no type inference |
| ChangeThisFile API | User uploads, varied workbooks, no infra | Per-call cost |
Common pitfalls
- Numbers shown as #####. That's a column-width display issue in Excel. The cell value itself is fine. CSV output gets the actual value.
- Dates show as numbers (44927). Excel stores dates as days since 1900-01-01. pandas converts them properly. openpyxl read_only with data_only=True returns datetime objects when the cell has a date format applied; raw integers otherwise.
- Merged cells. Only the top-left cell of a merged range has a value. Other cells in the merge return None / empty. CSV has no merge concept — you may want to forward-fill (df.ffill()) for repeated header values.
- Sheet name encoding. Sheet names can contain unicode and special characters. Sanitize for filenames if you're writing per-sheet output.
For your own controlled XLSX files, pandas is the right default. For multi-GB workbooks, openpyxl streaming. For user uploads where the structure varies, the API absorbs the variability. Free tier gives 1,000 conversions/month.