XLSX-to-CSV in Java with Apache POI is straightforward for typical workbooks. The main decisions are: (1) how to format cell values — use DataFormatter to get display-equivalent strings instead of raw doubles, (2) which sheet to export — by default the first sheet, (3) whether to evaluate formulas before exporting. For very large files, POI's streaming XSSF reader reads row by row from the ZIP-based XLSX without loading the full document into heap. For zero-dependency environments, the ChangeThisFile API handles it from a HttpClient POST.
Method 1: Apache POI (XSSFWorkbook + DataFormatter)
Apache POI reads XLSX and exports cells as display-equivalent strings via DataFormatter.
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>5.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-csv</artifactId>
<version>1.11.0</version>
</dependency>
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVPrinter;
import org.apache.poi.ss.usermodel.*;
import java.io.FileWriter;
import java.io.IOException;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.List;
public class XlsxToCsv {
/**
* Convert the first sheet of an XLSX file to CSV.
* Uses DataFormatter to get display values (respects number formats, dates).
*/
public static void convert(Path xlsxPath, Path csvPath) throws IOException {
DataFormatter formatter = new DataFormatter();
FormulaEvaluator evaluator;
try (Workbook wb = WorkbookFactory.create(xlsxPath.toFile(), null, true)) {
// true = read-only, faster for export
evaluator = wb.getCreationHelper().createFormulaEvaluator();
Sheet sheet = wb.getSheetAt(0);
try (CSVPrinter printer = new CSVPrinter(
new FileWriter(csvPath.toFile()),
CSVFormat.DEFAULT)) {
for (Row row : sheet) {
List<String> values = new ArrayList<>();
// Iterate up to the row's last cell, filling gaps with empty strings
int lastCol = row.getLastCellNum();
for (int col = 0; col < lastCol; col++) {
Cell cell = row.getCell(col, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK);
// Evaluate formula cells before formatting
if (cell.getCellType() == CellType.FORMULA) {
evaluator.evaluateFormulaCell(cell);
}
values.add(formatter.formatCellValue(cell, evaluator));
}
printer.printRecord(values);
}
}
}
}
/** Convert all sheets to separate CSV files: output_Sheet1.csv, output_Sheet2.csv, etc. */
public static void convertAllSheets(Path xlsxPath, Path outDir) throws IOException {
DataFormatter formatter = new DataFormatter();
try (Workbook wb = WorkbookFactory.create(xlsxPath.toFile(), null, true)) {
FormulaEvaluator evaluator = wb.getCreationHelper().createFormulaEvaluator();
for (int i = 0; i < wb.getNumberOfSheets(); i++) {
Sheet sheet = wb.getSheetAt(i);
String safeName = sheet.getSheetName().replaceAll("[^a-zA-Z0-9_-]", "_");
Path csvPath = outDir.resolve(safeName + ".csv");
try (CSVPrinter printer = new CSVPrinter(
new FileWriter(csvPath.toFile()), CSVFormat.DEFAULT)) {
for (Row row : sheet) {
List<String> values = new ArrayList<>();
for (int col = 0; col < row.getLastCellNum(); col++) {
Cell cell = row.getCell(col,
Row.MissingCellPolicy.CREATE_NULL_AS_BLANK);
if (cell.getCellType() == CellType.FORMULA) {
evaluator.evaluateFormulaCell(cell);
}
values.add(formatter.formatCellValue(cell, evaluator));
}
printer.printRecord(values);
}
}
System.out.println("Wrote " + csvPath.getFileName());
}
}
}
public static void main(String[] args) throws IOException {
convert(Path.of("data.xlsx"), Path.of("data.csv"));
System.out.println("Converted to data.csv");
}
}
DataFormatter is critical — without it, numeric cells return Double.toString() (e.g., "44927.0" for a date). DataFormatter returns "2023-01-01" the same way Excel displays it.
Large files: Streaming XSSF reader (SAX-based, low memory)
For XLSX files with 100k+ rows, the streaming API reads row by row from the ZIP file without loading the full workbook in memory:
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler;
import org.apache.poi.xssf.model.StylesTable;
import org.xml.sax.InputSource;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;
import java.io.FileWriter;
import java.io.InputStream;
import java.io.PrintWriter;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.List;
public class LargeXlsxToCsv implements XSSFSheetXMLHandler.SheetContentsHandler {
private final PrintWriter writer;
private List<String> currentRow = new ArrayList<>();
public LargeXlsxToCsv(Path csvPath) throws Exception {
this.writer = new PrintWriter(new FileWriter(csvPath.toFile()));
}
@Override
public void startRow(int rowNum) { currentRow.clear(); }
@Override
public void cell(String cellRef, String formattedValue, XSSFComment comment) {
currentRow.add(formattedValue != null ? formattedValue : "");
}
@Override
public void endRow(int rowNum) {
// Simple CSV: escape commas and quotes
StringBuilder sb = new StringBuilder();
for (int i = 0; i < currentRow.size(); i++) {
if (i > 0) sb.append(',');
String v = currentRow.get(i);
if (v.contains(",") || v.contains("\"") || v.contains("\n")) {
sb.append('"').append(v.replace("\"", "\"\"")).append('"');
} else {
sb.append(v);
}
}
writer.println(sb);
}
public void close() { writer.close(); }
public static void convert(Path xlsxPath, Path csvPath) throws Exception {
LargeXlsxToCsv handler = new LargeXlsxToCsv(csvPath);
try (OPCPackage pkg = OPCPackage.open(xlsxPath.toFile())) {
XSSFReader reader = new XSSFReader(pkg);
StylesTable styles = reader.getStylesTable();
ReadOnlySharedStringsTable sst = new ReadOnlySharedStringsTable(pkg);
// First sheet only
InputStream sheetStream = reader.getSheetsData().next();
XMLReader xmlReader = XMLReaderFactory.createXMLReader();
xmlReader.setContentHandler(new XSSFSheetXMLHandler(
styles, null, sst, handler, new DataFormatter(), false));
xmlReader.parse(new InputSource(sheetStream));
sheetStream.close();
}
handler.close();
}
}
Method 2: ChangeThisFile API (Java 11 HttpClient, no SDK)
POST the XLSX as multipart. Source is auto-detected. Free tier covers 1,000 conversions/month.
import java.io.IOException;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.time.Duration;
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;
public class XlsxToCsvApi {
private static final String API_KEY = "ctf_sk_your_key_here";
private static final String API_URL = "https://changethisfile.com/v1/convert";
private static final HttpClient HTTP = HttpClient.newBuilder()
.connectTimeout(Duration.ofSeconds(10))
.build();
public static byte[] convert(Path xlsxPath) throws IOException, InterruptedException {
String boundary = "----CTFBoundary" + UUID.randomUUID().toString().replace("-", "");
byte[] fileBytes = Files.readAllBytes(xlsxPath);
List<byte[]> parts = new ArrayList<>();
parts.add(("--" + boundary + "\r\n" +
"Content-Disposition: form-data; name=\"target\"\r\n\r\ncsv\r\n").getBytes(StandardCharsets.UTF_8));
parts.add(("--" + boundary + "\r\n" +
"Content-Disposition: form-data; name=\"file\"; filename=\"" + xlsxPath.getFileName() + "\"\r\n" +
"Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet\r\n\r\n"
).getBytes(StandardCharsets.UTF_8));
parts.add(fileBytes);
parts.add(("\r\n--" + boundary + "--\r\n").getBytes(StandardCharsets.UTF_8));
int totalLen = parts.stream().mapToInt(b -> b.length).sum();
byte[] body = new byte[totalLen];
int offset = 0;
for (byte[] part : parts) {
System.arraycopy(part, 0, body, offset, part.length);
offset += part.length;
}
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(API_URL))
.header("Authorization", "Bearer " + API_KEY)
.header("Content-Type", "multipart/form-data; boundary=" + boundary)
.timeout(Duration.ofSeconds(60))
.POST(HttpRequest.BodyPublishers.ofByteArray(body))
.build();
HttpResponse<byte[]> response = HTTP.send(request,
HttpResponse.BodyHandlers.ofByteArray());
if (response.statusCode() != 200) {
throw new IOException("API error " + response.statusCode() +
": " + new String(response.body()));
}
return response.body();
}
public static void main(String[] args) throws Exception {
byte[] csv = convert(Path.of("data.xlsx"));
Files.write(Path.of("data.csv"), csv);
System.out.println("Saved data.csv (" + csv.length + " bytes)");
}
}
When to use each
| Approach | Best for | Tradeoff |
|---|---|---|
| Apache POI (XSSFWorkbook) | Full control: formula evaluation, multi-sheet, cell formatting | Loads full workbook in heap — OOM above ~200k rows |
| SAX streaming XSSF | Large XLSX (100k+ rows), low memory footprint | More complex code; no formula evaluation in streaming mode |
| ChangeThisFile API | Zero deps, serverless, quick integration | Network latency; 25MB XLSX limit on free tier |
Production tips
- Always use DataFormatter, never cell.toString().
cell.toString()on a date cell returns a raw double (Excel's date serial). DataFormatter returns the display value (e.g., "2024-01-15") matching what the user sees in Excel. - Open workbooks in read-only mode for exports. Pass
trueas the third argument toWorkbookFactory.create(file, null, true). Read-only mode skips building the write-back data structures and uses significantly less heap. - Use CompletableFuture for batch XLSX-to-CSV jobs. Each POI Workbook is independent — parallelize across files with a bounded thread pool.
- For the API, reuse HttpClient. One static HttpClient instance per JVM manages connection pooling automatically.
- Set a 60-second timeout on API requests. Large XLSX files with complex formulas can take 30+ seconds to evaluate server-side.
Apache POI with DataFormatter is the right tool for XLSX-to-CSV in Java — it handles formulas, dates, and number formats correctly. Switch to the streaming XSSF reader for large files. For Lambda/Cloud Run without POI on the classpath, the ChangeThisFile API with Java 11's HttpClient needs zero additional JARs. Free tier covers 1,000 conversions/month.