Document

Creating Accessible Documents: Formats and Standards

Published Mar 19, 2026 9 min read By ChangeThisFile Team

Quick Answer

Document accessibility means people with disabilities can read, navigate, and understand your content using assistive technology. HTML is the most natively accessible format. DOCX with proper styles is accessible by default. PDF requires tagged structure (PDF/UA) to be accessible. Creating accessible documents from scratch is far easier than remediating inaccessible ones after the fact.

Document accessibility isn't optional — it's a legal requirement in many jurisdictions (Section 508 in the U.S., EN 301 549 in the EU, the Accessibility for Ontarians with Disabilities Act in Canada) and an ethical obligation everywhere. An inaccessible document is a document that excludes people who are blind, have low vision, are deaf (for multimedia documents), have motor disabilities (for interactive forms), or have cognitive disabilities.

The format you choose determines how much accessibility work you'll need to do. HTML is accessible by default when you use semantic elements. DOCX is accessible when you use heading styles and alt text. PDF is the hardest — it requires explicit tagged structure that most PDF creation workflows don't produce.

This guide covers the accessibility properties of each major document format, the standards that govern them, and the practical steps to create accessible documents from the start rather than remediating them after creation.

WCAG 2.1: The Standard That Governs Everything

The Web Content Accessibility Guidelines (WCAG) 2.1, published by the W3C, is the reference standard for document accessibility. Although "Web Content" is in the name, WCAG principles apply to all digital documents. The guidelines are organized around four principles (POUR):

Perceivable: Information must be presentable to users in ways they can perceive (alt text for images, captions for video, sufficient color contrast)
Operable: User interface components must be operable (keyboard navigation, sufficient time to read, no seizure-inducing content)
Understandable: Information and operation must be understandable (readable text, predictable behavior, input assistance)
Robust: Content must be robust enough for diverse user agents (valid markup, name/role/value for custom controls)

WCAG defines three conformance levels: A (minimum), AA (standard target for most regulations), and AAA (highest). Most laws reference WCAG 2.1 Level AA as the required standard. For documents, the most relevant success criteria are: text alternatives for images (1.1.1), heading structure (1.3.1), reading order (1.3.2), color contrast (1.4.3), text resize (1.4.4), and language identification (3.1.1).

HTML: Accessible by Default

HTML with semantic markup is the most natively accessible document format. Screen readers (JAWS, NVDA, VoiceOver, TalkBack) understand HTML elements directly:

<h1>-<h6> create a navigable heading outline. Screen reader users jump between headings to scan document structure.
<img alt="description"> provides text alternatives for images. Screen readers announce the alt text.
<table> with <th> creates data tables where screen readers announce row and column headers for each cell.
<a href> links are announced with their link text. "Click here" is bad link text; "Download the annual report" is good.
<nav>, <main>, <aside> create ARIA landmark regions for page navigation.
<lang> attribute identifies the document language, enabling correct screen reader pronunciation.

HTML's advantage: accessibility is structural. You don't add accessibility as a separate layer — you get it by using the right elements. A well-structured HTML document passes most WCAG criteria automatically. This is why converting documents to HTML can be an accessibility improvement when the resulting HTML uses semantic elements.

DOCX: Accessible When Properly Authored

DOCX accessibility depends entirely on how the document is authored. A well-authored DOCX with heading styles, alt text, and proper table structure is highly accessible. A DOCX with visual-only formatting (bold text pretending to be headings, tables used for layout, no alt text) is inaccessible regardless of format.

Heading styles: Use Word's built-in Heading 1, Heading 2, etc., styles for all headings. Screen readers use these to build a navigable outline. If you make text look like a heading using bold + large font but don't apply a heading style, screen readers don't know it's a heading.

Alt text: Right-click any image > Edit Alt Text. Provide a concise description of what the image communicates. If the image is decorative (doesn't convey information), mark it as decorative.

Tables: Use Word's Insert Table feature, not tabs or spaces to align text. Mark header rows (Table Design > Header Row checkbox). Don't merge cells unless necessary — merged cells confuse screen readers.

Reading order: DOCX reading order follows the document flow — top to bottom, in order. Text boxes and floating elements can disrupt reading order. Avoid floating text boxes; if you must use them, test with a screen reader to verify the reading order makes sense.

Accessibility Checker: Word has a built-in checker (Review > Check Accessibility) that identifies missing alt text, heading hierarchy issues, and other common problems. Run it before sharing any document.

PDF Accessibility: Tagged PDF and PDF/UA

PDF is the hardest format to make accessible because it doesn't inherently carry structure. A PDF with no tags is a collection of text and graphics at coordinates — screen readers can't determine headings, paragraphs, reading order, or table structure. They read text in the order it appears in the content stream, which may not match the visual reading order.

Tagged PDF: Tags are a semantic layer added to the PDF that labels content: <H1> for headings, <P> for paragraphs, <Table> for tables, <Figure> for images (with alt text), and <Span> for inline elements. Tagged PDFs have a structure tree (visible in Acrobat's Accessibility panel) that screen readers use for navigation.

PDF/UA (ISO 14289): The accessibility standard for PDF. PDF/UA requires: all content must be tagged, all tags must be in logical reading order, images must have alternative text, table structure must be properly tagged (TH, TD, TR), and the document language must be specified. PDF/UA conformance is the target for legally compliant accessible PDFs.

How to create accessible PDFs: The easiest path is creating a properly authored DOCX (with heading styles, alt text, tables) and converting to PDF. Word and LibreOffice both produce tagged PDFs when the source document uses proper styles. Acrobat's "Make Accessible" wizard can add or fix tags on existing PDFs, but this is remediation — harder and less reliable than getting it right at the source.

PDF Remediation: The Hard (and Expensive) Path

Taking an untagged PDF and adding accessibility tags is called remediation. It's labor-intensive: you need to add tags to every element, set the reading order, add alt text to images, mark up table structure, and validate the result. For a simple 10-page document, remediation takes 30-60 minutes. For a complex 100-page document with tables and figures, it can take days.

Professional PDF remediation services charge $5-25 per page, depending on complexity. Organizations with large PDF archives face remediation costs in the hundreds of thousands. This is the strongest argument for creating accessible documents from the start: a properly authored DOCX generates an accessible PDF automatically. Remediating that same content after it becomes an untagged PDF costs orders of magnitude more.

EPUB Accessibility

EPUB ebooks are HTML inside a ZIP container, so they inherit HTML's accessibility advantages. EPUB 3 supports ARIA roles, semantic inflection, media overlays (synchronized text and audio), and the EPUB Accessibility specification (based on WCAG).

Accessible EPUB features:

Semantic structure: EPUB chapters use HTML headings, lists, and semantic elements that screen readers navigate naturally.
Reflowable text: Unlike PDF's fixed layout, EPUB text reflows to fit any screen size and supports user font size preferences. Users with low vision can increase text size without horizontal scrolling.
Media overlays: Synchronized text and audio narration, where the text highlights as it's read. Critical for users with reading disabilities (dyslexia).
Navigation: EPUB's table of contents (nav.xhtml) provides a structured navigation panel that screen readers can use to jump between chapters and sections.

EPUB is often a better accessible format choice than PDF when the content is text-focused (books, reports, articles). PDF is better when exact visual layout must be preserved (forms, legal documents, technical drawings).

Not all formats work equally well with screen readers. Ranked from most to least accessible:

HTML: Best screen reader support. All major screen readers (JAWS, NVDA, VoiceOver, TalkBack) are built to read HTML. Semantic elements, ARIA roles, and live regions all work as designed.
EPUB: Excellent. EPUB readers with accessibility support (Apple Books, Thorium, Voice Dream) provide full screen reader navigation. EPUB's HTML foundation means all HTML accessibility features work.
DOCX: Good when properly authored. Microsoft Word's screen reader support is mature. Screen readers navigate by heading, table, and list. Non-Word applications vary — Google Docs' accessibility is good but different from Word's.
Tagged PDF: Adequate. Screen readers can navigate tagged PDFs using the tag structure. But PDF's fixed layout means text doesn't reflow for zoom, and some screen readers handle PDF tags less robustly than HTML elements.
Untagged PDF: Poor. Screen readers read text in content stream order, which may not match visual order. No heading navigation. No table structure. Essentially reading a stream of text with no context.
Plain text: Minimal. Screen readers read text sequentially. No structure, no navigation, but no barriers either. Plain text is predictable if limited.

Practical Accessibility Checklist

Regardless of format, every accessible document needs:

Heading hierarchy: Use heading levels in order (H1 > H2 > H3). Don't skip levels (H1 > H3). Don't use headings for visual emphasis on non-heading text.
Alt text for images: Every image needs a text alternative describing its content or purpose. Decorative images need a null alt attribute (in HTML) or should be marked decorative (in Word).
Table headers: Every data table needs header cells (TH in HTML, Header Row in Word). Complex tables with merged cells or multi-level headers need explicit header associations.
Link text: Links should describe their destination. "Click here" and "Read more" are meaningless out of context. "Download the Q3 earnings report (PDF, 2.1MB)" is descriptive.
Color contrast: Text must have a contrast ratio of at least 4.5:1 against its background (3:1 for large text). Use a contrast checker to verify.
Document language: Specify the primary language. In HTML: <html lang="en">. In Word: Review > Language. In PDF: File > Properties > Advanced > Language.
Reading order: Content must make sense when read linearly. Multi-column layouts, floating elements, and decorative headers can disrupt the reading order for screen readers.
Lists: Use proper list elements (UL/OL in HTML, list styles in Word), not manually typed dashes or numbers. Screen readers announce list structure ("list, 5 items") when proper elements are used.

Document accessibility is a design decision, not a post-production fix. The format you choose and how you author the document determine whether it's accessible from the moment you share it or whether it requires expensive remediation after the fact. The hierarchy is clear: HTML is accessible by default, DOCX is accessible when properly authored, and PDF requires explicit effort to tag.

The cheapest accessibility strategy: author documents in Word using proper heading styles and alt text, then export to PDF or convert to HTML. Both output formats inherit the structure from the source document. If you need to publish in multiple formats, start accessible and the conversions will be accessible too. Start inaccessible and you'll pay for remediation in every format.

Key Takeaways

HTML is the most natively accessible format. Semantic elements provide accessibility by default — no separate remediation needed.
DOCX is accessible when authored with heading styles, alt text, and proper tables. Word's Accessibility Checker catches common issues.
PDF requires tagged structure (PDF/UA) to be accessible. Untagged PDFs are essentially inaccessible to screen readers.
Creating accessible documents from scratch is far cheaper and more reliable than remediating inaccessible ones. PDF remediation costs $5-25 per page.
WCAG 2.1 Level AA is the standard required by most accessibility laws. Key criteria: heading structure, alt text, color contrast, reading order, and language identification.
EPUB inherits HTML's accessibility and adds reflowable text and media overlays — often better than PDF for text-focused content.
The heading hierarchy is the single most important accessibility feature. Screen reader users navigate by headings more than any other structure.

Frequently Asked Questions

What makes a document accessible?

An accessible document can be read, navigated, and understood by people using assistive technology (screen readers, magnifiers, voice input). This requires: heading structure for navigation, alt text for images, proper table markup, sufficient color contrast, specified document language, logical reading order, and descriptive link text. These features let screen readers present the content meaningfully.

Is PDF an accessible format?

Only when tagged. A tagged PDF (conforming to PDF/UA) includes semantic structure that screen readers can navigate. An untagged PDF — which is what most PDF creation tools produce by default — is essentially inaccessible. Screen readers read the text in content stream order with no structure. The key: create accessible PDFs by exporting from a properly authored Word document.

How do I check if my document is accessible?

In Word: Review > Check Accessibility. This identifies missing alt text, heading issues, and reading order problems. For PDF: Adobe Acrobat's Accessibility Checker (Tools > Accessibility > Full Check). For HTML: browser extensions like axe DevTools or WAVE scan for WCAG violations. For EPUB: the ACE (Accessibility Checker for EPUB) tool by the DAISY Consortium.

What is PDF/UA?

PDF/UA (ISO 14289) is the international standard for accessible PDFs. It requires all content to be tagged with proper structure, images to have alt text, tables to be structurally tagged, reading order to be logical, and document language to be specified. PDF/UA is to PDF accessibility what WCAG is to web accessibility — the definitive standard.

Why do screen readers read my PDF in the wrong order?

Untagged PDFs store text in content stream order, which often doesn't match the visual reading order. Multi-column layouts, headers, footers, and sidebars are particularly problematic — the content stream may jump between columns instead of reading each column sequentially. The fix: add tags to the PDF that define the correct reading order, or start from a properly structured source document.

Is EPUB more accessible than PDF?

For text-focused content, yes. EPUB's HTML foundation means screen readers navigate it naturally. Reflowable text supports user font size preferences. Media overlays enable synchronized audio narration. PDF's fixed layout is less flexible for users who need to resize text or change display settings. PDF is better when exact visual layout must be preserved.

How much does PDF remediation cost?

Professional PDF remediation services charge $5-25 per page, depending on document complexity. Simple text documents cost less; complex documents with tables, figures, and forms cost more. A 50-page document might cost $250-1,250 to remediate. Organizations with thousands of legacy PDFs can face six-figure remediation budgets — which is why creating accessible documents from the start is always cheaper.

Do I legally need to make my documents accessible?

In many jurisdictions, yes. U.S. federal agencies must comply with Section 508 (WCAG 2.0 Level AA). EU organizations must comply with EN 301 549 (WCAG 2.1 Level AA). Many U.S. states have similar laws. Private companies in the U.S. face ADA compliance obligations that increasingly include digital documents. Educational institutions receiving federal funding must provide accessible materials. The trend is clearly toward broader accessibility requirements.

Ready to convert your files?

Use ChangeThisFile to convert between 600+ formats — free, fast, and private.

Start Converting