PDFs on the JVM have one canonical solution: Apache PDFBox. It renders pages to BufferedImage with full DPI control and no native binaries beyond the JRE. Ktor's HttpClient handles the API path when you want to offload rendering entirely. This guide covers both plus the direct-Ktor API call.
Method 1: Apache PDFBox (JVM-native, no native binaries)
PDFBox is the standard PDF library for the JVM. Pure Java, no native binaries, ships as a single Maven/Gradle dependency.
// build.gradle.kts
dependencies {
implementation("org.apache.pdfbox:pdfbox:3.0.2")
}
import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.rendering.ImageType
import org.apache.pdfbox.rendering.PDFRenderer
import java.io.File
import javax.imageio.ImageIO
fun pdfToJpg(inputPath: String, outputDir: String, dpi: Float = 200f): List {
val outDir = File(outputDir).also { it.mkdirs() }
val doc = PDDocument.load(File(inputPath))
val renderer = PDFRenderer(doc)
val paths = mutableListOf()
doc.use {
for (page in 0 until doc.numberOfPages) {
val image = renderer.renderImageWithDPI(page, dpi, ImageType.RGB)
val outFile = File(outDir, "page-%03d.jpg".format(page + 1))
ImageIO.write(image, "JPEG", outFile)
paths += outFile.absolutePath
}
}
return paths
}
fun main() {
val pages = pdfToJpg("document.pdf", "./pages", dpi = 200f)
println("Wrote ${pages.size} pages")
}
DPI guide:
- 72 — screen thumbnails only. Blurry for text-heavy PDFs.
- 150 — web-quality. Good default for most use cases.
- 200–300 — print-quality. Use for scanned documents or archival.
- 600+ — archival-grade. Very large files; use sparingly.
PDFBox automatically closes the document via doc.use { } — always wrap in use to avoid file-handle leaks on multi-page runs.
Method 2: Parallel rendering with Kotlin coroutines
For large PDFs, rendering pages concurrently cuts wall-clock time significantly. PDFBox's PDFRenderer is thread-safe after construction.
// build.gradle.kts
dependencies {
implementation("org.apache.pdfbox:pdfbox:3.0.2")
implementation("org.jetbrains.kotlinx:kotlinx-coroutines-core:1.8.1")
}
import kotlinx.coroutines.*
import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.rendering.ImageType
import org.apache.pdfbox.rendering.PDFRenderer
import java.io.File
import javax.imageio.ImageIO
suspend fun pdfToJpgParallel(
inputPath: String,
outputDir: String,
dpi: Float = 200f,
parallelism: Int = 4
): List = withContext(Dispatchers.IO) {
val outDir = File(outputDir).also { it.mkdirs() }
val doc = PDDocument.load(File(inputPath))
val renderer = PDFRenderer(doc)
doc.use {
(0 until doc.numberOfPages)
.chunked(parallelism)
.flatMap { chunk ->
chunk.map { page ->
async {
val image = renderer.renderImageWithDPI(page, dpi, ImageType.RGB)
val outFile = File(outDir, "page-%03d.jpg".format(page + 1))
ImageIO.write(image, "JPEG", outFile)
outFile.absolutePath
}
}.awaitAll()
}
.sorted()
}
}
fun main() = runBlocking {
val pages = pdfToJpgParallel("document.pdf", "./pages", dpi = 200f, parallelism = 4)
println("Wrote ${pages.size} pages")
}
Chunking into groups of parallelism avoids spinning up hundreds of coroutines for a 200-page PDF. Dispatchers.IO uses a thread pool sized for blocking I/O — ideal for PDFBox's rendering work.
Method 3: ChangeThisFile API via Ktor HttpClient (no local rendering)
No PDFBox dependency. POST the file as multipart to /v1/convert — the API runs Poppler server-side and returns JPG or a ZIP for multi-page PDFs. Free tier: 1,000 conversions/month, no card required.
# curl reference
curl -X POST https://changethisfile.com/v1/convert \
-H "Authorization: Bearer ctf_sk_your_key_here" \
-F "file=@document.pdf" \
-F "target=jpg" \
--output result.jpg
// build.gradle.kts
dependencies {
implementation("io.ktor:ktor-client-core:2.3.11")
implementation("io.ktor:ktor-client-cio:2.3.11")
implementation("io.ktor:ktor-client-content-negotiation:2.3.11")
}
import io.ktor.client.*
import io.ktor.client.engine.cio.*
import io.ktor.client.request.*
import io.ktor.client.request.forms.*
import io.ktor.client.statement.*
import io.ktor.http.*
import java.io.File
import java.util.zip.ZipInputStream
const val API_KEY = "ctf_sk_your_key_here"
suspend fun pdfToJpgApi(inputPath: String, outputDir: String): List {
val outDir = File(outputDir).also { it.mkdirs() }
val client = HttpClient(CIO)
val response: HttpResponse = client.submitFormWithBinaryData(
url = "https://changethisfile.com/v1/convert",
formData = formData {
append("file", File(inputPath).readBytes(), Headers.build {
append(HttpHeaders.ContentDisposition, "filename=\"${File(inputPath).name}\"")
append(HttpHeaders.ContentType, ContentType.Application.Pdf)
})
append("target", "jpg")
}
) {
header("Authorization", "Bearer $API_KEY")
}
val bytes = response.readBytes()
client.close()
return if (response.headers[HttpHeaders.ContentType]?.contains("zip") == true) {
// multi-page PDF returns a zip
val zipFile = File(outDir, "pages.zip").also { it.writeBytes(bytes) }
ZipInputStream(zipFile.inputStream()).use { zip ->
generateSequence { zip.nextEntry }
.filter { it.name.endsWith(".jpg") }
.map { entry ->
val out = File(outDir, entry.name)
out.writeBytes(zip.readBytes())
out.absolutePath
}.toList()
}.also { zipFile.delete() }
} else {
val out = File(outDir, "page-001.jpg").also { it.writeBytes(bytes) }
listOf(out.absolutePath)
}
}
fun main() = kotlinx.coroutines.runBlocking {
val pages = pdfToJpgApi("document.pdf", "./pages")
println("Wrote ${pages.size} pages")
}
Source format is auto-detected — no need to pass source=pdf. The API renders at 150 DPI by default; pass append("dpi", "300") in formData for higher quality.
When to use each
| Approach | Best for | Tradeoff |
|---|---|---|
| PDFBox (sync) | Self-hosted batch jobs, DPI control | Large JAR (~10MB); needs JVM |
| PDFBox + coroutines | Multi-page PDFs, throughput-sensitive pipelines | More complex error handling across coroutines |
| ChangeThisFile API (Ktor) | No PDFBox in classpath, serverless/edge JVM, pay-per-use | Network latency, 25MB free-tier upload limit |
Production tips
- Always close PDDocument. Use doc.use { } or a try-with-resources equivalent. Leaked handles cause silent failures on subsequent opens of the same file.
- 150 DPI is the right web default. 300+ doubles memory and processing time with no visible gain on screen. Reserve 300 for OCR input or print.
- Set JPEG quality explicitly. ImageIO's default JPEG quality is 75% — lower than you want. Use ImageWriter:
val writer = ImageIO.getImageWritersByFormatName("jpeg").next(); val param = writer.defaultWriteParam.apply { compressionMode = ImageWriteParam.MODE_EXPLICIT; compressionQuality = 0.88f }. - Watch memory on large PDFs. Each page renders into a BufferedImage in heap. A 300-DPI A4 page is roughly 40MB of raw pixels. For 50+ page PDFs at high DPI, tune
-Xmxor render one page at a time and flush. - Password-protected PDFs. Call
PDDocument.load(file, password). PDFBox throws an InvalidPasswordException on wrong password — catch it explicitly rather than letting it surface as a generic error.
For JVM services, PDFBox with coroutine parallelism is the production-grade path. For scripts or environments where you want zero extra deps, the API. Free tier covers 1,000 conversions/month.