Document accessibility isn't optional — it's a legal requirement in many jurisdictions (Section 508 in the U.S., EN 301 549 in the EU, the Accessibility for Ontarians with Disabilities Act in Canada) and an ethical obligation everywhere. An inaccessible document is a document that excludes people who are blind, have low vision, are deaf (for multimedia documents), have motor disabilities (for interactive forms), or have cognitive disabilities.
The format you choose determines how much accessibility work you'll need to do. HTML is accessible by default when you use semantic elements. DOCX is accessible when you use heading styles and alt text. PDF is the hardest — it requires explicit tagged structure that most PDF creation workflows don't produce.
This guide covers the accessibility properties of each major document format, the standards that govern them, and the practical steps to create accessible documents from the start rather than remediating them after creation.
WCAG 2.1: The Standard That Governs Everything
The Web Content Accessibility Guidelines (WCAG) 2.1, published by the W3C, is the reference standard for document accessibility. Although "Web Content" is in the name, WCAG principles apply to all digital documents. The guidelines are organized around four principles (POUR):
- Perceivable: Information must be presentable to users in ways they can perceive (alt text for images, captions for video, sufficient color contrast)
- Operable: User interface components must be operable (keyboard navigation, sufficient time to read, no seizure-inducing content)
- Understandable: Information and operation must be understandable (readable text, predictable behavior, input assistance)
- Robust: Content must be robust enough for diverse user agents (valid markup, name/role/value for custom controls)
WCAG defines three conformance levels: A (minimum), AA (standard target for most regulations), and AAA (highest). Most laws reference WCAG 2.1 Level AA as the required standard. For documents, the most relevant success criteria are: text alternatives for images (1.1.1), heading structure (1.3.1), reading order (1.3.2), color contrast (1.4.3), text resize (1.4.4), and language identification (3.1.1).
HTML: Accessible by Default
HTML with semantic markup is the most natively accessible document format. Screen readers (JAWS, NVDA, VoiceOver, TalkBack) understand HTML elements directly:
<h1>-<h6>create a navigable heading outline. Screen reader users jump between headings to scan document structure.<img alt="description">provides text alternatives for images. Screen readers announce the alt text.<table>with<th>creates data tables where screen readers announce row and column headers for each cell.<a href>links are announced with their link text. "Click here" is bad link text; "Download the annual report" is good.<nav>,<main>,<aside>create ARIA landmark regions for page navigation.<lang>attribute identifies the document language, enabling correct screen reader pronunciation.
HTML's advantage: accessibility is structural. You don't add accessibility as a separate layer — you get it by using the right elements. A well-structured HTML document passes most WCAG criteria automatically. This is why converting documents to HTML can be an accessibility improvement when the resulting HTML uses semantic elements.
DOCX: Accessible When Properly Authored
DOCX accessibility depends entirely on how the document is authored. A well-authored DOCX with heading styles, alt text, and proper table structure is highly accessible. A DOCX with visual-only formatting (bold text pretending to be headings, tables used for layout, no alt text) is inaccessible regardless of format.
Heading styles: Use Word's built-in Heading 1, Heading 2, etc., styles for all headings. Screen readers use these to build a navigable outline. If you make text look like a heading using bold + large font but don't apply a heading style, screen readers don't know it's a heading.
Alt text: Right-click any image > Edit Alt Text. Provide a concise description of what the image communicates. If the image is decorative (doesn't convey information), mark it as decorative.
Tables: Use Word's Insert Table feature, not tabs or spaces to align text. Mark header rows (Table Design > Header Row checkbox). Don't merge cells unless necessary — merged cells confuse screen readers.
Reading order: DOCX reading order follows the document flow — top to bottom, in order. Text boxes and floating elements can disrupt reading order. Avoid floating text boxes; if you must use them, test with a screen reader to verify the reading order makes sense.
Accessibility Checker: Word has a built-in checker (Review > Check Accessibility) that identifies missing alt text, heading hierarchy issues, and other common problems. Run it before sharing any document.
PDF Accessibility: Tagged PDF and PDF/UA
PDF is the hardest format to make accessible because it doesn't inherently carry structure. A PDF with no tags is a collection of text and graphics at coordinates — screen readers can't determine headings, paragraphs, reading order, or table structure. They read text in the order it appears in the content stream, which may not match the visual reading order.
Tagged PDF: Tags are a semantic layer added to the PDF that labels content: <H1> for headings, <P> for paragraphs, <Table> for tables, <Figure> for images (with alt text), and <Span> for inline elements. Tagged PDFs have a structure tree (visible in Acrobat's Accessibility panel) that screen readers use for navigation.
PDF/UA (ISO 14289): The accessibility standard for PDF. PDF/UA requires: all content must be tagged, all tags must be in logical reading order, images must have alternative text, table structure must be properly tagged (TH, TD, TR), and the document language must be specified. PDF/UA conformance is the target for legally compliant accessible PDFs.
How to create accessible PDFs: The easiest path is creating a properly authored DOCX (with heading styles, alt text, tables) and converting to PDF. Word and LibreOffice both produce tagged PDFs when the source document uses proper styles. Acrobat's "Make Accessible" wizard can add or fix tags on existing PDFs, but this is remediation — harder and less reliable than getting it right at the source.
PDF Remediation: The Hard (and Expensive) Path
Taking an untagged PDF and adding accessibility tags is called remediation. It's labor-intensive: you need to add tags to every element, set the reading order, add alt text to images, mark up table structure, and validate the result. For a simple 10-page document, remediation takes 30-60 minutes. For a complex 100-page document with tables and figures, it can take days.
Professional PDF remediation services charge $5-25 per page, depending on complexity. Organizations with large PDF archives face remediation costs in the hundreds of thousands. This is the strongest argument for creating accessible documents from the start: a properly authored DOCX generates an accessible PDF automatically. Remediating that same content after it becomes an untagged PDF costs orders of magnitude more.
EPUB Accessibility
EPUB ebooks are HTML inside a ZIP container, so they inherit HTML's accessibility advantages. EPUB 3 supports ARIA roles, semantic inflection, media overlays (synchronized text and audio), and the EPUB Accessibility specification (based on WCAG).
Accessible EPUB features:
- Semantic structure: EPUB chapters use HTML headings, lists, and semantic elements that screen readers navigate naturally.
- Reflowable text: Unlike PDF's fixed layout, EPUB text reflows to fit any screen size and supports user font size preferences. Users with low vision can increase text size without horizontal scrolling.
- Media overlays: Synchronized text and audio narration, where the text highlights as it's read. Critical for users with reading disabilities (dyslexia).
- Navigation: EPUB's table of contents (nav.xhtml) provides a structured navigation panel that screen readers can use to jump between chapters and sections.
EPUB is often a better accessible format choice than PDF when the content is text-focused (books, reports, articles). PDF is better when exact visual layout must be preserved (forms, legal documents, technical drawings).
Screen Reader Compatibility by Format
Not all formats work equally well with screen readers. Ranked from most to least accessible:
- HTML: Best screen reader support. All major screen readers (JAWS, NVDA, VoiceOver, TalkBack) are built to read HTML. Semantic elements, ARIA roles, and live regions all work as designed.
- EPUB: Excellent. EPUB readers with accessibility support (Apple Books, Thorium, Voice Dream) provide full screen reader navigation. EPUB's HTML foundation means all HTML accessibility features work.
- DOCX: Good when properly authored. Microsoft Word's screen reader support is mature. Screen readers navigate by heading, table, and list. Non-Word applications vary — Google Docs' accessibility is good but different from Word's.
- Tagged PDF: Adequate. Screen readers can navigate tagged PDFs using the tag structure. But PDF's fixed layout means text doesn't reflow for zoom, and some screen readers handle PDF tags less robustly than HTML elements.
- Untagged PDF: Poor. Screen readers read text in content stream order, which may not match visual order. No heading navigation. No table structure. Essentially reading a stream of text with no context.
- Plain text: Minimal. Screen readers read text sequentially. No structure, no navigation, but no barriers either. Plain text is predictable if limited.
Practical Accessibility Checklist
Regardless of format, every accessible document needs:
- Heading hierarchy: Use heading levels in order (H1 > H2 > H3). Don't skip levels (H1 > H3). Don't use headings for visual emphasis on non-heading text.
- Alt text for images: Every image needs a text alternative describing its content or purpose. Decorative images need a null alt attribute (in HTML) or should be marked decorative (in Word).
- Table headers: Every data table needs header cells (TH in HTML, Header Row in Word). Complex tables with merged cells or multi-level headers need explicit header associations.
- Link text: Links should describe their destination. "Click here" and "Read more" are meaningless out of context. "Download the Q3 earnings report (PDF, 2.1MB)" is descriptive.
- Color contrast: Text must have a contrast ratio of at least 4.5:1 against its background (3:1 for large text). Use a contrast checker to verify.
- Document language: Specify the primary language. In HTML:
<html lang="en">. In Word: Review > Language. In PDF: File > Properties > Advanced > Language. - Reading order: Content must make sense when read linearly. Multi-column layouts, floating elements, and decorative headers can disrupt the reading order for screen readers.
- Lists: Use proper list elements (UL/OL in HTML, list styles in Word), not manually typed dashes or numbers. Screen readers announce list structure ("list, 5 items") when proper elements are used.
Document accessibility is a design decision, not a post-production fix. The format you choose and how you author the document determine whether it's accessible from the moment you share it or whether it requires expensive remediation after the fact. The hierarchy is clear: HTML is accessible by default, DOCX is accessible when properly authored, and PDF requires explicit effort to tag.
The cheapest accessibility strategy: author documents in Word using proper heading styles and alt text, then export to PDF or convert to HTML. Both output formats inherit the structure from the source document. If you need to publish in multiple formats, start accessible and the conversions will be accessible too. Start inaccessible and you'll pay for remediation in every format.