EPUB is the dominant open ebook standard, maintained by the W3C since 2017. Every major reading platform supports it: Apple Books, Google Play Books, Kobo, Nook, Calibre, and since 2022, even Kindle via Send-to-Kindle. It's an open spec, not a proprietary format owned by a single company.

But most people treat EPUB as a black box. They convert to it, convert from it, and never look inside. That's a mistake if you're publishing, debugging formatting issues, or building ebook tooling. EPUB is remarkably transparent once you know the structure. Rename any .epub file to .zip, extract it, and you're looking at web content: XHTML files, CSS stylesheets, images, and a few XML manifests that tie everything together.

This guide walks through every layer of the EPUB spec, from the ZIP container to the reading order spine, with enough detail to build one from scratch or diagnose why your conversion came out wrong.

EPUB Is a ZIP File: The Container Layer

Every EPUB is a ZIP archive with the .epub extension. Not metaphorically. Literally. Run unzip book.epub -d book/ and you get a directory of files. The ZIP must contain a specific file as its first entry: mimetype, containing the string application/epub+zip with no newline, stored uncompressed (ZIP stored method, not deflate). This requirement exists so file-type detection tools can identify EPUBs by reading the first ~38 bytes of the file.

After extraction, you'll find a standard directory structure:

mimetype
META-INF/
  container.xml
  [encryption.xml]
  [signatures.xml]
OEBPS/ (or EPUB/, content/, etc.)
  content.opf
  toc.ncx or nav.xhtml
  chapter01.xhtml
  chapter02.xhtml
  styles/
    book.css
  images/
    cover.jpg
    figure01.png

The META-INF/container.xml file is mandatory. It points to the root content file (the OPF package document). Everything else lives wherever the OPF says it does. The directory name (OEBPS, EPUB, content) is convention, not requirement.

container.xml: The Entry Point

This is the first file any EPUB reader parses. It's simple XML that declares where the package document lives:

<?xml version="1.0" encoding="UTF-8"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <rootfiles>
    <rootfile full-path="OEBPS/content.opf"
             media-type="application/oebps-package+xml"/>
  </rootfiles>
</container>

The full-path attribute is relative to the ZIP root. An EPUB can technically contain multiple rootfiles (the spec allows it), but no reader supports this in practice. One rootfile, one OPF, one book.

The OPF Package Document: The Book's Manifest

The OPF (Open Packaging Format) file is the central control document for the entire EPUB. It contains three critical sections: metadata (what the book is), manifest (what files it contains), and spine (what order to read them in). The filename is typically content.opf or package.opf, but the actual name doesn't matter — container.xml points to it.

Dublin Core Metadata

The <metadata> section uses Dublin Core elements for book information:

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
  <dc:title>The Great Gatsby</dc:title>
  <dc:creator>F. Scott Fitzgerald</dc:creator>
  <dc:language>en</dc:language>
  <dc:identifier id="bookid">urn:isbn:9780743273565</dc:identifier>
  <dc:publisher>Scribner</dc:publisher>
  <dc:date>1925-04-10</dc:date>
  <dc:rights>Public Domain</dc:rights>
  <meta property="dcterms:modified">2026-01-15T12:00:00Z</meta>
</metadata>

Three elements are mandatory: dc:title, dc:language, and dc:identifier (a unique ID — ISBN, UUID, or URI). Everything else is optional but strongly recommended. Library management software like Calibre uses this metadata to organize, sort, and display books. A file with missing metadata shows up as "Unknown" by "Unknown" with no cover. dc:creator supports the opf:role attribute using MARC relator codes (aut for author, edt for editor, ill for illustrator) to distinguish contributors.

The Manifest: Every File Listed

The <manifest> section lists every file in the EPUB with an ID, path, and MIME type:

<manifest>
  <item id="nav" href="nav.xhtml" media-type="application/xhtml+xml" properties="nav"/>
  <item id="ch01" href="chapter01.xhtml" media-type="application/xhtml+xml"/>
  <item id="ch02" href="chapter02.xhtml" media-type="application/xhtml+xml"/>
  <item id="css" href="styles/book.css" media-type="text/css"/>
  <item id="cover-img" href="images/cover.jpg" media-type="image/jpeg" properties="cover-image"/>
  <item id="fig01" href="images/figure01.png" media-type="image/png"/>
</manifest>

Every file in the ZIP (except mimetype, META-INF/ contents, and the OPF itself) must be listed here. If a file is in the ZIP but not in the manifest, readers may ignore it. If a manifest entry points to a file that doesn't exist, epubcheck will flag it as an error. The properties attribute marks special items: nav for the navigation document, cover-image for the cover, mathml for content using MathML, svg for SVG content, scripted for JavaScript.

The Spine: Reading Order

The <spine> section defines the linear reading order by referencing manifest IDs:

<spine toc="ncx">
  <itemref idref="ch01"/>
  <itemref idref="ch02"/>
  <itemref idref="ch03"/>
  <itemref idref="appendix" linear="no"/>
</spine>

When a reader displays a book, it follows the spine order. Swiping forward goes to the next itemref. The linear="no" attribute marks supplementary content (appendices, indices) that isn't part of the main reading flow but is accessible via links or the table of contents. The toc attribute on the <spine> element references the NCX file (EPUB 2) for backward compatibility.

A common debugging issue: content exists in the EPUB but doesn't appear when reading. Nine times out of ten, it's listed in the manifest but missing from the spine. The file is there; the reader just doesn't know when to show it.

EPUB has two table of contents systems. EPUB 2 uses NCX (Navigation Control for XML). EPUB 3 uses a nav document (an XHTML file with a specific structure). Most EPUBs include both for backward compatibility.

NCX (EPUB 2 Legacy)

The NCX file (toc.ncx) is an XML document mapping navigation labels to content locations:

<navMap>
  <navPoint id="ch1" playOrder="1">
    <navLabel><text>Chapter 1: The Green Light</text></navLabel>
    <content src="chapter01.xhtml"/>
    <navPoint id="ch1-s1" playOrder="2">
      <navLabel><text>The Buchanans</text></navLabel>
      <content src="chapter01.xhtml#buchanans"/>
    </navPoint>
  </navPoint>
</navMap>

NCX supports nested navigation (chapters containing sections), playOrder for sequential numbering, and fragment identifiers (#buchanans) for mid-chapter navigation. It's verbose, XML-heavy, and functional. Most EPUB 3 files still include an NCX for older readers that don't support the nav document.

EPUB 3 replaced NCX with a regular XHTML file containing a <nav> element with the epub:type="toc" attribute:

<nav epub:type="toc">
  <ol>
    <li><a href="chapter01.xhtml">Chapter 1: The Green Light</a>
      <ol>
        <li><a href="chapter01.xhtml#buchanans">The Buchanans</a></li>
      </ol>
    </li>
  </ol>
</nav>

This is standard HTML. The nav document can include additional navigation structures: epub:type="landmarks" (bodymatter, frontmatter, backmatter) and epub:type="page-list" (mapping to print edition page numbers). Because it's XHTML, it can be styled with CSS and displayed as a content page, not just a hidden metadata structure. The nav document must be listed in the manifest with properties="nav".

EPUB 2 vs EPUB 3: What Actually Changed

EPUB 2 (2007) was XHTML 1.1 and CSS 2. It handled text, images, and basic formatting. EPUB 3 (2011, latest revision 3.3 in 2023) is a fundamental upgrade.

Content Model Upgrades

EPUB 3 content documents use HTML5 (technically XHTML5 — XML-serialized HTML5). This means:

  • Semantic elements<section>, <article>, <aside>, <figure>, <figcaption> for meaningful document structure
  • MathML — Native mathematical notation rendering. Textbooks with equations no longer need images of formulas. <math xmlns="http://www.w3.org/1998/Math/MathML"> inline in content
  • SVG — Scalable vector graphics inline or as separate files. Critical for diagrams, charts, and technical illustrations that must render sharply at any zoom level
  • Audio/Video<audio> and <video> elements for embedded media. Support is reader-dependent; Kobo and Apple Books handle it, most e-ink devices don't
  • JavaScript — EPUB 3 allows scripted content. In practice, almost no reader supports it beyond Apple Books. The spec includes a scripted property on manifest items to indicate JavaScript-dependent content, so readers can warn users or fall back

Fixed Layout (FXL)

EPUB 3 introduced fixed-layout support for content where page design matters as much as text. A fixed-layout EPUB declares viewport dimensions in the OPF metadata:

<meta property="rendition:layout">pre-paginated</meta>
<meta property="rendition:spread">landscape</meta>
<meta property="rendition:orientation">auto</meta>

Each content document specifies its page size via a <meta name="viewport" content="width=1200, height=1600"/> tag. Content is positioned absolutely, like a PDF or web page. Fixed-layout EPUB is used for children's picture books (each page is a designed spread), cookbooks (text wrapping around food photography), comics and graphic novels, and textbooks with complex layouts. Apple Books, Kobo, and Google Play Books render FXL well. Kindle converts FXL EPUB to its own fixed layout format.

Media Overlays: Synchronized Audio

Media overlays synchronize audio narration with text content, enabling read-along ebooks. The overlay is a SMIL (Synchronized Multimedia Integration Language) document:

<smil xmlns="http://www.w3.org/ns/SMIL" version="3.0">
  <body>
    <seq>
      <par>
        <text src="chapter01.xhtml#para01"/>
        <audio src="audio/ch01.mp3" clipBegin="0s" clipEnd="4.5s"/>
      </par>
      <par>
        <text src="chapter01.xhtml#para02"/>
        <audio src="audio/ch01.mp3" clipBegin="4.5s" clipEnd="9.2s"/>
      </par>
    </seq>
  </body>
</smil>

Each <par> element pairs a text fragment (via ID selector) with a slice of an audio file (via clip times). The reader highlights each text segment as the corresponding audio plays. Apple Books handles media overlays natively. It's the primary format for professional children's read-along books and accessibility-focused publications. Creating media overlays requires aligning text with audio at the paragraph or sentence level, which is tedious manually but increasingly automated with forced-alignment tools.

CSS in EPUB: What Works, What Doesn't

EPUB uses CSS for all visual styling, but reader apps aggressively override author styles to respect user preferences. Understanding this interplay is essential for predictable rendering.

The CSS Override Problem

When a user sets font size to "Large" in Apple Books, the reader injects its own CSS that overrides your font-size declarations. When a user enables dark mode, the reader inverts or replaces your background and text colors. This is by design — reflowable ebooks are meant to adapt to user preferences.

What you can reliably control: font-family (if you embed the font), text-align, margins and padding on block elements, list styling, table layout, border styles, and image sizing. What you cannot reliably control: font-size (users override it), line-height (many readers impose their own), background-color and color (dark mode), page margins (reader-controlled). The practical advice: design for the default rendering, test in Apple Books/Calibre/Kobo with various user settings, and don't fight the reader. If you need pixel-perfect control, use fixed layout.

Embedded Fonts

EPUB supports embedded fonts via @font-face in CSS, just like the web. Fonts must be listed in the manifest and can be OTF, TTF, or WOFF. WOFF2 support is growing but not universal across readers. Font files can be obfuscated (scrambled with a key derived from the book's unique identifier) to comply with font licenses that prohibit redistribution — the EPUB spec defines both IDPF and Adobe font obfuscation methods.

Embedded fonts are critical for books with non-Latin scripts, mathematical notation, or distinctive branding. A novel in standard English prose probably doesn't need embedded fonts — the reader's default serif or sans-serif will work fine. A book with Arabic, Devanagari, or specialized symbols absolutely needs them.

Validation with epubcheck

epubcheck is the official EPUB validation tool, maintained by the W3C. It checks conformance to the EPUB specification: well-formed XML, valid OPF structure, correct MIME types, manifest completeness, spine references, metadata requirements, content document validity, and CSS compatibility.

Running epubcheck

epubcheck is a Java application. Run it from the command line:

java -jar epubcheck.jar book.epub

It outputs errors (spec violations that will break readers), warnings (issues that may cause problems), and info messages (suggestions). Common errors include: missing required metadata (dc:language, dc:identifier), manifest items referencing non-existent files, content documents with invalid XHTML (unclosed tags, unescaped ampersands), incorrect MIME types in the manifest, and the mimetype file being compressed or having a newline.

A clean epubcheck run doesn't guarantee the book looks good — it guarantees the book is structurally valid. Visual testing on actual devices and reader apps is a separate, equally important step.

Common Validation Failures

"mimetype file must be uncompressed" — The ZIP creation tool compressed the mimetype entry. Use zip -X0 book.epub mimetype first, then zip -Xr9 book.epub OEBPS/ META-INF/.

"element X not allowed here" — EPUB 2 uses XHTML 1.1; HTML5 elements like <section> or <nav> are invalid. Either upgrade to EPUB 3 or use XHTML 1.1 elements.

"referenced resource not found" — A manifest item or content link points to a file not in the ZIP. Common after renaming or moving files without updating references.

"duplicate ID" — Two elements in the same document have the same id attribute. This breaks fragment navigation and accessibility.

"non-standard image type" — EPUB core media types are JPEG, PNG, GIF, and SVG. WebP, AVIF, and BMP are not core types. You can include them with a fallback, but most readers won't display them.

Building an EPUB from Scratch

You don't need specialized software to create an EPUB. A text editor and a ZIP tool are sufficient. Here's the minimal process:

Minimum Viable EPUB

Create this directory structure:

my-book/
  mimetype                    (plain text: application/epub+zip)
  META-INF/
    container.xml             (points to content.opf)
  OEBPS/
    content.opf               (metadata + manifest + spine)
    nav.xhtml                 (table of contents)
    chapter01.xhtml           (content)
    styles.css                (optional styling)

Write your content as XHTML files. Create the OPF with metadata, manifest listing every file, and spine ordering the chapters. Create the nav document. Then ZIP it with the correct method:

cd my-book
zip -X0 ../book.epub mimetype
zip -Xr9 ../book.epub META-INF/ OEBPS/

The -X0 flag stores mimetype without compression and without extra attributes. The -Xr9 flag recursively compresses everything else at maximum compression. Run epubcheck book.epub to validate. This process produces a valid EPUB 3 file that any reader can open. In practice, tools like Pandoc, Calibre, or Sigil automate this, but knowing the manual process helps when debugging tool output.

Converting EPUB on ChangeThisFile

ChangeThisFile supports EPUB as both a source and target format, using Calibre's ebook-convert engine on the server side. Calibre is the same tool professional publishers use for format conversion, running headless on our backend with a 180-second timeout for ebook conversions.

Supported conversions from EPUB: EPUB to MOBI, EPUB to AZW3, EPUB to PDF, EPUB to FB2, EPUB to TXT, EPUB to Markdown. The EPUB-to-MOBI and EPUB-to-AZW3 conversions are clean because the internal content models are similar. EPUB-to-PDF renders pages at a fixed size, losing reflowability. EPUB-to-TXT strips all formatting.

Conversions to EPUB: MOBI to EPUB, AZW3 to EPUB, FB2 to EPUB, PDF to EPUB, DOCX to EPUB, HTML to EPUB, TXT to EPUB, CBR to EPUB, CBZ to EPUB. Note that PDF-to-EPUB and CBR/CBZ-to-EPUB are inherently limited — PDF lacks document structure, and comic archives are just images.

EPUB's transparency is its greatest strength. Unlike proprietary formats where you're at the mercy of a vendor's tools, EPUB lets you inspect, modify, and debug every layer of a book. The spec is well-documented, the validation tool is free, and the internal format is the same HTML/CSS/XML that web developers already know.

If you're working with ebooks professionally — publishing, converting, or building tools — invest time in understanding the OPF, spine, and nav structure. When a conversion produces bad output, you can open both the source and result EPUBs, compare the XML, and find exactly what went wrong. That diagnostic ability is worth more than any conversion tool's GUI.