Roughly 15% of the world's population lives with some form of disability. Screen reader users can't see images. Deaf and hard-of-hearing users can't hear video or audio. Users with vestibular disorders are harmed by unexpected motion. Making media accessible isn't a checkbox exercise — it's the difference between your content being available to 85% of potential users or 100%.

This guide covers the concrete implementation for every media type: the HTML attributes, ARIA roles, WebVTT format, and CSS media queries that make images, video, and audio accessible. Every section includes code you can ship.

Image Alt Text: The Rules

The alt attribute is the most important accessibility feature on the web. It's also the most frequently botched. The rules are simple once you learn them:

Informative images (convey content): Describe what the image shows in the context of the surrounding content.

<!-- Good: describes what's in the image and why it matters -->
<img src="chart.png" alt="Bar chart showing monthly revenue growing from $10K in January to $45K in June 2026">

<!-- Bad: filename is not a description -->
<img src="chart.png" alt="chart.png">

<!-- Bad: too vague -->
<img src="chart.png" alt="chart">

<!-- Bad: too verbose -->
<img src="chart.png" alt="This is an image of a bar chart that shows the revenue figures for each month from January through June of the year 2026, with January being ten thousand dollars and June being forty-five thousand dollars, showing a clear upward trend">

Decorative images (visual flair, no content): Use an empty alt="" so screen readers skip them entirely.

<!-- Decorative: purely visual spacer/ornament -->
<img src="divider.svg" alt="">

<!-- Decorative: background texture -->
<img src="pattern.png" alt="" role="presentation">

Functional images (links, buttons): Describe the action, not the image.

<!-- Good: describes the action -->
<a href="/"><img src="logo.svg" alt="ChangeThisFile home"></a>

<!-- Bad: describes the image, not the action -->
<a href="/"><img src="logo.svg" alt="Blue and green logo"></a>

<!-- Good: icon button -->
<button><img src="search.svg" alt="Search"></button>

Complex images (charts, diagrams, infographics): Provide a short alt text AND a longer description nearby or via aria-describedby.

<figure>
  <img src="architecture.png" alt="System architecture diagram" aria-describedby="arch-desc">
  <figcaption id="arch-desc">
    The system consists of three layers: a Cloudflare Worker handling HTTP requests,
    an Express.js backend for file conversion, and Cloudflare KV for page storage.
    Client requests flow through the Worker, which proxies conversion requests to
    the Express backend and serves static pages from KV.
  </figcaption>
</figure>

figure and figcaption

The <figure> element groups an image (or video, or code block, or quote) with its caption. Screen readers announce the association between the content and its caption.

<figure>
  <picture>
    <source srcset="comparison.avif" type="image/avif">
    <source srcset="comparison.webp" type="image/webp">
    <img src="comparison.jpg" alt="Side-by-side comparison of JPG at 85 quality versus WebP at 80 quality, showing identical visual quality with WebP 30% smaller" width="1200" height="600">
  </picture>
  <figcaption>WebP at quality 80 matches JPG at quality 85 with 30% smaller file size. Both images are 1200px wide.</figcaption>
</figure>

When to use figcaption vs alt: Alt text is for screen readers — it replaces the image when it can't be seen. Figcaption is visible to everyone — it adds context that supplements the image. An image can have both: alt text describing what it shows, and a figcaption providing additional context, credit, or data source.

Don't repeat yourself. If the figcaption already describes the image adequately, the alt text can be shorter or even empty (if the figcaption is sufficient). Screen readers read both, so identical content in alt and figcaption is redundant and annoying.

SVG Accessibility

SVGs are tricky because they can be included in three ways, each with different accessibility implications.

Inline SVG (most accessible):

<!-- Informative SVG: use role="img" + aria-labelledby -->
<svg role="img" aria-labelledby="icon-title icon-desc" viewBox="0 0 24 24">
  <title id="icon-title">Download</title>
  <desc id="icon-desc">Arrow pointing downward into a tray</desc>
  <path d="M12 15l-5-5h3V4h4v6h3l-5 5z" />
  <path d="M5 18h14v2H5z" />
</svg>

<!-- Decorative SVG: hide from assistive technology -->
<svg aria-hidden="true" focusable="false" viewBox="0 0 24 24">
  <path d="..." />
</svg>

SVG via <img> tag:

<!-- Treated like any other image -->
<img src="icon.svg" alt="Download" width="24" height="24">

<!-- Decorative -->
<img src="decoration.svg" alt="" role="presentation">

SVG in CSS background: CSS backgrounds are invisible to screen readers. If the SVG conveys meaning, don't use it as a background — use an <img> or inline SVG instead. If it's purely decorative, CSS background is fine.

Key SVG accessibility elements:

  • <title> — Short accessible name (like alt text for images)
  • <desc> — Longer accessible description
  • role="img" — Tells screen readers this SVG is a single image, not a group of shapes
  • aria-labelledby — Points to the title and/or desc elements
  • aria-hidden="true" — Hides decorative SVGs from screen readers
  • focusable="false" — Prevents keyboard focus on decorative SVGs (IE/Edge legacy issue)

Video Captions: The Element and WebVTT

Captions are required for accessible video. The <track> element provides captions in WebVTT format:

<video controls width="1280" height="720" preload="metadata" poster="thumb.jpg">
  <source src="tutorial.webm" type="video/webm">
  <source src="tutorial.mp4" type="video/mp4">
  
  <!-- Captions (dialogue + relevant sounds) -->
  <track kind="captions" src="captions-en.vtt" srclang="en" label="English" default>
  <track kind="captions" src="captions-es.vtt" srclang="es" label="Español">
  
  <!-- Subtitles (dialogue only, no sound descriptions) -->
  <track kind="subtitles" src="subtitles-fr.vtt" srclang="fr" label="Français">
</video>

WebVTT format:

WEBVTT

00:00:01.000 --> 00:00:04.500
Welcome to this tutorial on file conversion.

00:00:05.000 --> 00:00:09.200
We'll cover how to convert PNG images
to WebP format for better web performance.

00:00:10.000 --> 00:00:14.000
[Screen recording: dragging a file
into the converter drop zone]

00:00:14.500 --> 00:00:18.000
Drag your file into the drop zone,
or click to browse.

00:00:18.500 --> 00:00:22.000
[Conversion progress bar animating
from 0% to 100%]

Captions vs subtitles: Captions include dialogue AND non-speech audio ([door slams], [music playing], [phone rings]). They're for deaf and hard-of-hearing users. Subtitles are dialogue-only, typically for translation. Use kind="captions" for English captions of English video, kind="subtitles" for translated text.

Generating Captions

Creating captions manually is time-consuming but produces the best quality. For faster turnaround, use speech-to-text services as a starting point and manually correct errors:

  • YouTube's auto-captions: Upload to YouTube (even as unlisted), download the auto-generated VTT, correct errors, use on your site
  • Whisper (OpenAI): Open-source speech recognition with high accuracy. Run locally or via API
  • Rev, Descript, Otter.ai: Commercial transcription services with human review options

Auto-generated captions are a starting point, not a final product. They routinely mangle technical terms, proper nouns, and speaker identification. Always review and correct.

Audio Descriptions and Transcripts

Audio descriptions narrate visual content for blind users. They describe on-screen actions, text, and visual changes during natural pauses in dialogue. This is required for WCAG AAA (and recommended for AA) when video conveys important visual information not described in the audio track.

<video controls>
  <source src="tutorial.mp4" type="video/mp4">
  <track kind="captions" src="captions.vtt" srclang="en" label="Captions" default>
  <track kind="descriptions" src="descriptions.vtt" srclang="en" label="Audio Descriptions">
</video>

Transcripts are the essential alternative for all audio content — podcasts, voice messages, audio guides. A transcript is a text document that includes all spoken content and relevant sound descriptions.

<section aria-label="Podcast episode">
  <h2>Episode 42: Image Optimization</h2>
  <audio controls preload="metadata">
    <source src="episode-42.mp3" type="audio/mpeg">
  </audio>
  <details>
    <summary>Read transcript</summary>
    <div class="transcript">
      <p><strong>Host:</strong> Today we're talking about image optimization..</p>
      <p><strong>Guest:</strong> The most impactful change is switching from JPG to WebP...</p>
    </div>
  </details>
</section>

Transcripts also benefit SEO — search engines can index the full text content of your audio and video.

Motion Sensitivity: prefers-reduced-motion

Autoplay video, animated images, and CSS animations can cause discomfort, nausea, or seizures for users with vestibular disorders or photosensitive epilepsy. The prefers-reduced-motion media query lets you respect this preference.

/* Disable all non-essential animation */
@media (prefers-reduced-motion: reduce) {
  *, *::before, *::after {
    animation-duration: 0.01ms !important;
    animation-iteration-count: 1 !important;
    transition-duration: 0.01ms !important;
  }
}

/* Replace autoplay video with static image for reduced-motion users */
@media (prefers-reduced-motion: reduce) {
  .hero-video { display: none; }
  .hero-static { display: block; }
}
@media (prefers-reduced-motion: no-preference) {
  .hero-video { display: block; }
  .hero-static { display: none; }
}
// JavaScript: respect reduced motion preference
const reduceMotion = window.matchMedia('(prefers-reduced-motion: reduce)');

function handleMotionPreference(mq) {
  document.querySelectorAll('video[autoplay]').forEach(video => {
    if (mq.matches) {
      video.pause();
      video.removeAttribute('autoplay');
    } else {
      video.play();
    }
  });
}

handleMotionPreference(reduceMotion);
reduceMotion.addEventListener('change', handleMotionPreference);

WCAG 2.1 requirements: SC 2.3.1 (Three Flashes or Below Threshold): nothing flashes more than 3 times per second. SC 2.2.2 (Pause, Stop, Hide): provide a mechanism to pause, stop, or hide any auto-updating content. SC 2.3.3 (Animation from Interactions): motion triggered by interaction can be disabled.

ARIA for Custom Media Players

If you build a custom video or audio player (replacing the browser default controls), every control needs proper ARIA roles and labels.

<div role="region" aria-label="Video player">
  <video id="player" aria-describedby="video-description">
    <source src="demo.mp4" type="video/mp4">
  </video>
  <p id="video-description" class="sr-only">Demonstration of file conversion process</p>
  
  <div role="toolbar" aria-label="Video controls">
    <button aria-label="Play" data-action="play">
      <svg aria-hidden="true"><!-- play icon --></svg>
    </button>
    
    <button aria-label="Mute" data-action="mute">
      <svg aria-hidden="true"><!-- volume icon --></svg>
    </button>
    
    <input type="range" min="0" max="100" value="0"
      role="slider" aria-label="Seek" aria-valuemin="0" aria-valuemax="100"
      aria-valuenow="0" aria-valuetext="0 seconds of 2 minutes 30 seconds">
    
    <input type="range" min="0" max="100" value="80"
      role="slider" aria-label="Volume" aria-valuemin="0" aria-valuemax="100"
      aria-valuenow="80" aria-valuetext="80%">
    
    <button aria-label="Toggle captions" aria-pressed="false" data-action="captions">
      CC
    </button>
    
    <button aria-label="Enter full screen" data-action="fullscreen">
      <svg aria-hidden="true"><!-- fullscreen icon --></svg>
    </button>
  </div>
</div>

Key requirements:

  • Every button needs aria-label (icon buttons have no visible text)
  • The play button label should toggle between "Play" and "Pause"
  • Sliders need aria-valuetext with human-readable values ("1 minute 30 seconds" not "90")
  • Caption toggle needs aria-pressed to indicate state
  • The player region needs role="region" with aria-label
  • All controls must be keyboard accessible (Tab to navigate, Enter/Space to activate)

Focus Management for Media

Keyboard users navigate with Tab. When media players appear in modals, expand to fullscreen, or change state, focus management matters.

// When opening a video modal, move focus to the video or close button
function openVideoModal(videoSrc) {
  const modal = document.getElementById('video-modal');
  const video = modal.querySelector('video');
  const closeBtn = modal.querySelector('.close-btn');
  
  video.src = videoSrc;
  modal.style.display = 'flex';
  modal.setAttribute('aria-hidden', 'false');
  
  // Store what had focus so we can return
  modal._previousFocus = document.activeElement;
  
  // Move focus to close button
  closeBtn.focus();
  
  // Trap focus within modal
  modal.addEventListener('keydown', trapFocus);
}

function closeVideoModal() {
  const modal = document.getElementById('video-modal');
  const video = modal.querySelector('video');
  
  video.pause();
  video.src = '';
  modal.style.display = 'none';
  modal.setAttribute('aria-hidden', 'true');
  
  // Return focus to trigger element
  modal._previousFocus?.focus();
  modal.removeEventListener('keydown', trapFocus);
}

// Close on Escape
document.addEventListener('keydown', (e) => {
  if (e.key === 'Escape') closeVideoModal();
});

Accessible media is achievable with a handful of HTML attributes and CSS media queries. Alt text for images, captions for video, transcripts for audio, and reduced-motion respect for animations. None of these are technically difficult — the challenge is making them part of your standard workflow rather than an afterthought.

The effort pays off beyond accessibility: alt text improves SEO, captions improve engagement (85% of Facebook video is watched without sound), and transcripts make your audio content indexable by search engines. Accessible media is better media for everyone.