Ebook

EPUB: Inside the Open Ebook Standard

Published Mar 19, 2026 9 min read By ChangeThisFile Team

Quick Answer

EPUB is a ZIP archive containing XHTML content, CSS stylesheets, and XML metadata. EPUB 3 adds HTML5, JavaScript, MathML, audio/video embeds, and fixed-layout support. Understanding the internals lets you debug formatting issues, build EPUBs from scratch, and know exactly what breaks during conversion.

EPUB is the dominant open ebook standard, maintained by the W3C since 2017. Every major reading platform supports it: Apple Books, Google Play Books, Kobo, Nook, Calibre, and since 2022, even Kindle via Send-to-Kindle. It's an open spec, not a proprietary format owned by a single company.

But most people treat EPUB as a black box. They convert to it, convert from it, and never look inside. That's a mistake if you're publishing, debugging formatting issues, or building ebook tooling. EPUB is remarkably transparent once you know the structure. Rename any .epub file to .zip, extract it, and you're looking at web content: XHTML files, CSS stylesheets, images, and a few XML manifests that tie everything together.

This guide walks through every layer of the EPUB spec, from the ZIP container to the reading order spine, with enough detail to build one from scratch or diagnose why your conversion came out wrong.

EPUB Is a ZIP File: The Container Layer

Every EPUB is a ZIP archive with the .epub extension. Not metaphorically. Literally. Run unzip book.epub -d book/ and you get a directory of files. The ZIP must contain a specific file as its first entry: mimetype, containing the string application/epub+zip with no newline, stored uncompressed (ZIP stored method, not deflate). This requirement exists so file-type detection tools can identify EPUBs by reading the first ~38 bytes of the file.

After extraction, you'll find a standard directory structure:

mimetype
META-INF/
  container.xml
  [encryption.xml]
  [signatures.xml]
OEBPS/ (or EPUB/, content/, etc.)
  content.opf
  toc.ncx or nav.xhtml
  chapter01.xhtml
  chapter02.xhtml
  styles/
    book.css
  images/
    cover.jpg
    figure01.png

The META-INF/container.xml file is mandatory. It points to the root content file (the OPF package document). Everything else lives wherever the OPF says it does. The directory name (OEBPS, EPUB, content) is convention, not requirement.

container.xml: The Entry Point

This is the first file any EPUB reader parses. It's simple XML that declares where the package document lives:

<?xml version="1.0" encoding="UTF-8"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <rootfiles>
    <rootfile full-path="OEBPS/content.opf"
             media-type="application/oebps-package+xml"/>
  </rootfiles>
</container>

The full-path attribute is relative to the ZIP root. An EPUB can technically contain multiple rootfiles (the spec allows it), but no reader supports this in practice. One rootfile, one OPF, one book.

The OPF Package Document: The Book's Manifest

The OPF (Open Packaging Format) file is the central control document for the entire EPUB. It contains three critical sections: metadata (what the book is), manifest (what files it contains), and spine (what order to read them in). The filename is typically content.opf or package.opf, but the actual name doesn't matter — container.xml points to it.

Dublin Core Metadata

The <metadata> section uses Dublin Core elements for book information:

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
  <dc:title>The Great Gatsby</dc:title>
  <dc:creator>F. Scott Fitzgerald</dc:creator>
  <dc:language>en</dc:language>
  <dc:identifier id="bookid">urn:isbn:9780743273565</dc:identifier>
  <dc:publisher>Scribner</dc:publisher>
  <dc:date>1925-04-10</dc:date>
  <dc:rights>Public Domain</dc:rights>
  <meta property="dcterms:modified">2026-01-15T12:00:00Z</meta>
</metadata>

Three elements are mandatory: dc:title, dc:language, and dc:identifier (a unique ID — ISBN, UUID, or URI). Everything else is optional but strongly recommended. Library management software like Calibre uses this metadata to organize, sort, and display books. A file with missing metadata shows up as "Unknown" by "Unknown" with no cover. dc:creator supports the opf:role attribute using MARC relator codes (aut for author, edt for editor, ill for illustrator) to distinguish contributors.

The Manifest: Every File Listed

The <manifest> section lists every file in the EPUB with an ID, path, and MIME type:

<manifest>
  <item id="nav" href="nav.xhtml" media-type="application/xhtml+xml" properties="nav"/>
  <item id="ch01" href="chapter01.xhtml" media-type="application/xhtml+xml"/>
  <item id="ch02" href="chapter02.xhtml" media-type="application/xhtml+xml"/>
  <item id="css" href="styles/book.css" media-type="text/css"/>
  <item id="cover-img" href="images/cover.jpg" media-type="image/jpeg" properties="cover-image"/>
  <item id="fig01" href="images/figure01.png" media-type="image/png"/>
</manifest>

Every file in the ZIP (except mimetype, META-INF/ contents, and the OPF itself) must be listed here. If a file is in the ZIP but not in the manifest, readers may ignore it. If a manifest entry points to a file that doesn't exist, epubcheck will flag it as an error. The properties attribute marks special items: nav for the navigation document, cover-image for the cover, mathml for content using MathML, svg for SVG content, scripted for JavaScript.

The Spine: Reading Order

The <spine> section defines the linear reading order by referencing manifest IDs:

<spine toc="ncx">
  <itemref idref="ch01"/>
  <itemref idref="ch02"/>
  <itemref idref="ch03"/>
  <itemref idref="appendix" linear="no"/>
</spine>

When a reader displays a book, it follows the spine order. Swiping forward goes to the next itemref. The linear="no" attribute marks supplementary content (appendices, indices) that isn't part of the main reading flow but is accessible via links or the table of contents. The toc attribute on the <spine> element references the NCX file (EPUB 2) for backward compatibility.

A common debugging issue: content exists in the EPUB but doesn't appear when reading. Nine times out of ten, it's listed in the manifest but missing from the spine. The file is there; the reader just doesn't know when to show it.

EPUB has two table of contents systems. EPUB 2 uses NCX (Navigation Control for XML). EPUB 3 uses a nav document (an XHTML file with a specific structure). Most EPUBs include both for backward compatibility.

NCX (EPUB 2 Legacy)

The NCX file (toc.ncx) is an XML document mapping navigation labels to content locations:

<navMap>
  <navPoint id="ch1" playOrder="1">
    <navLabel><text>Chapter 1: The Green Light</text></navLabel>
    <content src="chapter01.xhtml"/>
    <navPoint id="ch1-s1" playOrder="2">
      <navLabel><text>The Buchanans</text></navLabel>
      <content src="chapter01.xhtml#buchanans"/>
    </navPoint>
  </navPoint>
</navMap>

NCX supports nested navigation (chapters containing sections), playOrder for sequential numbering, and fragment identifiers (#buchanans) for mid-chapter navigation. It's verbose, XML-heavy, and functional. Most EPUB 3 files still include an NCX for older readers that don't support the nav document.

NAV Document (EPUB 3)

EPUB 3 replaced NCX with a regular XHTML file containing a <nav> element with the epub:type="toc" attribute:

<nav epub:type="toc">
  <ol>
    <li><a href="chapter01.xhtml">Chapter 1: The Green Light</a>
      <ol>
        <li><a href="chapter01.xhtml#buchanans">The Buchanans</a></li>
      </ol>
    </li>
  </ol>
</nav>

This is standard HTML. The nav document can include additional navigation structures: epub:type="landmarks" (bodymatter, frontmatter, backmatter) and epub:type="page-list" (mapping to print edition page numbers). Because it's XHTML, it can be styled with CSS and displayed as a content page, not just a hidden metadata structure. The nav document must be listed in the manifest with properties="nav".

EPUB 2 vs EPUB 3: What Actually Changed

EPUB 2 (2007) was XHTML 1.1 and CSS 2. It handled text, images, and basic formatting. EPUB 3 (2011, latest revision 3.3 in 2023) is a fundamental upgrade.

Content Model Upgrades

EPUB 3 content documents use HTML5 (technically XHTML5 — XML-serialized HTML5). This means:

Semantic elements — <section>, <article>, <aside>, <figure>, <figcaption> for meaningful document structure
MathML — Native mathematical notation rendering. Textbooks with equations no longer need images of formulas. <math xmlns="http://www.w3.org/1998/Math/MathML"> inline in content
SVG — Scalable vector graphics inline or as separate files. Critical for diagrams, charts, and technical illustrations that must render sharply at any zoom level
Audio/Video — <audio> and <video> elements for embedded media. Support is reader-dependent; Kobo and Apple Books handle it, most e-ink devices don't
JavaScript — EPUB 3 allows scripted content. In practice, almost no reader supports it beyond Apple Books. The spec includes a scripted property on manifest items to indicate JavaScript-dependent content, so readers can warn users or fall back

Fixed Layout (FXL)

EPUB 3 introduced fixed-layout support for content where page design matters as much as text. A fixed-layout EPUB declares viewport dimensions in the OPF metadata:

<meta property="rendition:layout">pre-paginated</meta>
<meta property="rendition:spread">landscape</meta>
<meta property="rendition:orientation">auto</meta>

Each content document specifies its page size via a <meta name="viewport" content="width=1200, height=1600"/> tag. Content is positioned absolutely, like a PDF or web page. Fixed-layout EPUB is used for children's picture books (each page is a designed spread), cookbooks (text wrapping around food photography), comics and graphic novels, and textbooks with complex layouts. Apple Books, Kobo, and Google Play Books render FXL well. Kindle converts FXL EPUB to its own fixed layout format.

Media Overlays: Synchronized Audio

Media overlays synchronize audio narration with text content, enabling read-along ebooks. The overlay is a SMIL (Synchronized Multimedia Integration Language) document:

<smil xmlns="http://www.w3.org/ns/SMIL" version="3.0">
  <body>
    <seq>
      <par>
        <text src="chapter01.xhtml#para01"/>
        <audio src="audio/ch01.mp3" clipBegin="0s" clipEnd="4.5s"/>
      </par>
      <par>
        <text src="chapter01.xhtml#para02"/>
        <audio src="audio/ch01.mp3" clipBegin="4.5s" clipEnd="9.2s"/>
      </par>
    </seq>
  </body>
</smil>

Each <par> element pairs a text fragment (via ID selector) with a slice of an audio file (via clip times). The reader highlights each text segment as the corresponding audio plays. Apple Books handles media overlays natively. It's the primary format for professional children's read-along books and accessibility-focused publications. Creating media overlays requires aligning text with audio at the paragraph or sentence level, which is tedious manually but increasingly automated with forced-alignment tools.

CSS in EPUB: What Works, What Doesn't

EPUB uses CSS for all visual styling, but reader apps aggressively override author styles to respect user preferences. Understanding this interplay is essential for predictable rendering.

The CSS Override Problem

When a user sets font size to "Large" in Apple Books, the reader injects its own CSS that overrides your font-size declarations. When a user enables dark mode, the reader inverts or replaces your background and text colors. This is by design — reflowable ebooks are meant to adapt to user preferences.

What you can reliably control: font-family (if you embed the font), text-align, margins and padding on block elements, list styling, table layout, border styles, and image sizing. What you cannot reliably control: font-size (users override it), line-height (many readers impose their own), background-color and color (dark mode), page margins (reader-controlled). The practical advice: design for the default rendering, test in Apple Books/Calibre/Kobo with various user settings, and don't fight the reader. If you need pixel-perfect control, use fixed layout.

Embedded Fonts

EPUB supports embedded fonts via @font-face in CSS, just like the web. Fonts must be listed in the manifest and can be OTF, TTF, or WOFF. WOFF2 support is growing but not universal across readers. Font files can be obfuscated (scrambled with a key derived from the book's unique identifier) to comply with font licenses that prohibit redistribution — the EPUB spec defines both IDPF and Adobe font obfuscation methods.

Embedded fonts are critical for books with non-Latin scripts, mathematical notation, or distinctive branding. A novel in standard English prose probably doesn't need embedded fonts — the reader's default serif or sans-serif will work fine. A book with Arabic, Devanagari, or specialized symbols absolutely needs them.

Validation with epubcheck

epubcheck is the official EPUB validation tool, maintained by the W3C. It checks conformance to the EPUB specification: well-formed XML, valid OPF structure, correct MIME types, manifest completeness, spine references, metadata requirements, content document validity, and CSS compatibility.

Running epubcheck

epubcheck is a Java application. Run it from the command line:

java -jar epubcheck.jar book.epub

It outputs errors (spec violations that will break readers), warnings (issues that may cause problems), and info messages (suggestions). Common errors include: missing required metadata (dc:language, dc:identifier), manifest items referencing non-existent files, content documents with invalid XHTML (unclosed tags, unescaped ampersands), incorrect MIME types in the manifest, and the mimetype file being compressed or having a newline.

A clean epubcheck run doesn't guarantee the book looks good — it guarantees the book is structurally valid. Visual testing on actual devices and reader apps is a separate, equally important step.

Common Validation Failures

"mimetype file must be uncompressed" — The ZIP creation tool compressed the mimetype entry. Use zip -X0 book.epub mimetype first, then zip -Xr9 book.epub OEBPS/ META-INF/.

"element X not allowed here" — EPUB 2 uses XHTML 1.1; HTML5 elements like <section> or <nav> are invalid. Either upgrade to EPUB 3 or use XHTML 1.1 elements.

"referenced resource not found" — A manifest item or content link points to a file not in the ZIP. Common after renaming or moving files without updating references.

"duplicate ID" — Two elements in the same document have the same id attribute. This breaks fragment navigation and accessibility.

"non-standard image type" — EPUB core media types are JPEG, PNG, GIF, and SVG. WebP, AVIF, and BMP are not core types. You can include them with a fallback, but most readers won't display them.

Building an EPUB from Scratch

You don't need specialized software to create an EPUB. A text editor and a ZIP tool are sufficient. Here's the minimal process:

Minimum Viable EPUB

Create this directory structure:

my-book/
  mimetype                    (plain text: application/epub+zip)
  META-INF/
    container.xml             (points to content.opf)
  OEBPS/
    content.opf               (metadata + manifest + spine)
    nav.xhtml                 (table of contents)
    chapter01.xhtml           (content)
    styles.css                (optional styling)

Write your content as XHTML files. Create the OPF with metadata, manifest listing every file, and spine ordering the chapters. Create the nav document. Then ZIP it with the correct method:

cd my-book
zip -X0 ../book.epub mimetype
zip -Xr9 ../book.epub META-INF/ OEBPS/

The -X0 flag stores mimetype without compression and without extra attributes. The -Xr9 flag recursively compresses everything else at maximum compression. Run epubcheck book.epub to validate. This process produces a valid EPUB 3 file that any reader can open. In practice, tools like Pandoc, Calibre, or Sigil automate this, but knowing the manual process helps when debugging tool output.

Converting EPUB on ChangeThisFile

ChangeThisFile supports EPUB as both a source and target format, using Calibre's ebook-convert engine on the server side. Calibre is the same tool professional publishers use for format conversion, running headless on our backend with a 180-second timeout for ebook conversions.

Supported conversions from EPUB: EPUB to MOBI, EPUB to AZW3, EPUB to PDF, EPUB to FB2, EPUB to TXT, EPUB to Markdown. The EPUB-to-MOBI and EPUB-to-AZW3 conversions are clean because the internal content models are similar. EPUB-to-PDF renders pages at a fixed size, losing reflowability. EPUB-to-TXT strips all formatting.

Conversions to EPUB: MOBI to EPUB, AZW3 to EPUB, FB2 to EPUB, PDF to EPUB, DOCX to EPUB, HTML to EPUB, TXT to EPUB, CBR to EPUB, CBZ to EPUB. Note that PDF-to-EPUB and CBR/CBZ-to-EPUB are inherently limited — PDF lacks document structure, and comic archives are just images.

EPUB's transparency is its greatest strength. Unlike proprietary formats where you're at the mercy of a vendor's tools, EPUB lets you inspect, modify, and debug every layer of a book. The spec is well-documented, the validation tool is free, and the internal format is the same HTML/CSS/XML that web developers already know.

If you're working with ebooks professionally — publishing, converting, or building tools — invest time in understanding the OPF, spine, and nav structure. When a conversion produces bad output, you can open both the source and result EPUBs, compare the XML, and find exactly what went wrong. That diagnostic ability is worth more than any conversion tool's GUI.

Key Takeaways

EPUB is a ZIP archive with a mimetype file, container.xml, OPF package document, and XHTML content files — all inspectable with standard tools
The OPF has three critical sections: Dublin Core metadata (what the book is), manifest (what files exist), and spine (reading order)
EPUB 3 adds HTML5, MathML, SVG, audio/video, fixed layout, media overlays, and JavaScript support over EPUB 2
Fixed-layout EPUB uses pre-paginated rendering with absolute positioning — used for children's books, comics, and design-heavy content
Media overlays use SMIL to synchronize audio narration with text, enabling professional read-along ebooks
CSS in EPUB is real but reader apps aggressively override author styles for user preferences (font size, dark mode)
epubcheck is the official W3C validation tool — run it on every EPUB before distribution
You can build a valid EPUB with a text editor and zip command — no specialized software required

Frequently Asked Questions

Can I edit an EPUB file directly?

Yes. Rename it to .zip, extract, edit the XHTML/CSS/OPF files in any text editor, re-zip with the correct method (mimetype uncompressed first), and rename back to .epub. Sigil is a dedicated EPUB editor that handles this transparently. Calibre also has a built-in editor. For quick text fixes, the manual extract-edit-rezip approach works fine.

What's the difference between the OPF manifest and the spine?

The manifest lists every file in the EPUB with its MIME type — content documents, images, CSS, fonts, everything. The spine lists only the content documents in reading order. A file can be in the manifest (so images and CSS work) without being in the spine (so it doesn't appear as a 'page' during sequential reading). The manifest is the inventory; the spine is the playlist.

Why does my EPUB look different on different readers?

Reading apps override CSS to respect user preferences. Font size, line height, margins, and colors all get modified. Dark mode inverts your color scheme. Some readers ignore embedded fonts entirely. The EPUB spec intentionally gives readers this freedom — reflowable content is meant to adapt. If you need pixel-perfect rendering, use fixed-layout EPUB.

Is EPUB 3 backward compatible with EPUB 2 readers?

Partially. An EPUB 3 file with basic text content and an included NCX (EPUB 2 navigation) will usually render in EPUB 2 readers. But EPUB 3 features like MathML, media overlays, and fixed layout won't work. The safe approach is to include both a nav.xhtml (EPUB 3) and toc.ncx (EPUB 2) for maximum compatibility.

What image formats does EPUB support?

EPUB core media types include JPEG, PNG, GIF, and SVG. These are guaranteed to work in all conforming readers. Other formats (WebP, AVIF, BMP) can be included as foreign resources with XHTML fallbacks, but reader support is inconsistent. For maximum compatibility, stick to JPEG for photographs and PNG for graphics with transparency.

How large can an EPUB file be?

There's no spec-defined size limit, but practical constraints exist. Apple Books requires files under 2GB. Most e-ink readers struggle with EPUBs over 200MB due to limited RAM. Amazon's Send-to-Kindle accepts files up to 50MB via email. A typical novel EPUB is 0.5-5MB. Image-heavy books (cookbooks, art books, comics in fixed layout) can reach 50-200MB. Keep file size reasonable for your target devices.

Can EPUB contain interactive content or JavaScript?

EPUB 3 allows JavaScript, but support is extremely limited. Apple Books is the only major reader that executes JS in EPUBs, and even then with restrictions. Most e-ink devices and reading apps ignore scripts entirely. The spec requires manifest items to declare the scripted property, and readers may warn users or disable scripting. Don't rely on JavaScript for core reading functionality.

What's the best tool for creating EPUB from a manuscript?

For writers: export from Word/Google Docs to EPUB via Calibre, or use Pandoc to convert Markdown to EPUB. For publishers: Sigil for hand-crafted EPUBs, Adobe InDesign for professional layout-to-EPUB export. For developers: Pandoc or custom scripts that generate the OPF and XHTML programmatically. Calibre is the Swiss Army knife — it handles conversion from virtually any source format to EPUB.

Ready to convert your files?

Use ChangeThisFile to convert between 600+ formats — free, fast, and private.

Start Converting