What’s the difference and when do you need each?
If you have ever added text to a video and wondered whether you are doing it correctly, you are not alone. The terms captions, subtitles, and transcripts are often used interchangeably, but they refer to different things with different purposes and different accessibility implications. Understanding the distinction will help you choose the right approach for your content and ensure that your videos and audio materials are genuinely accessible.
Captions are a synchronised text version of all audio content in a video, including spoken dialogue, sound effects, and music cues that are relevant to understanding the content. The key word is all: captions are designed for viewers who cannot hear the audio, so they must include everything that a hearing viewer would hear. This includes laughter, applause, a door slamming, or music that signals a mood change – not just the words being spoken.
Captions are timed to appear on screen in sync with the audio. They are usually positioned at the bottom of the video and can be either open (always visible and burned into the video) or closed (switchable on or off by the viewer, often via a CC button). Closed captions give viewers more control and are generally preferable.
Captions are an accessibility requirement for video content. WCAG 2.1 requires captions for all pre-recorded video that contains audio, and for live video where captions are technically feasible.
Subtitles are a translation of spoken dialogue into another language. Unlike captions, subtitles assume the viewer can hear the audio but does not understand the language being spoken. They typically do not include non-speech audio information such as sound effects or tone descriptions.
This distinction matters for accessibility: subtitles in the same language as the video are not a substitute for captions. A subtitle track that only displays spoken words, without identifying speakers or including relevant non-speech sounds, does not meet the accessibility needs of viewers who are deaf or hard of hearing.
That said, subtitles serve an important accessibility function for audiences who are not native speakers of the video’s language, or for viewers watching in a noisy or quiet environment.
A transcript is a text document that contains the full text of a video or audio recording. Unlike captions and subtitles, a transcript is not synchronised with the media – it is a standalone document that can be read independently of the video or audio file.
Transcripts serve several accessibility purposes. They allow people who are deaf-blind (and use braille displays) to access audio and video content. They are useful for people with cognitive disabilities who may prefer to read at their own pace rather than follow a video. They also benefit anyone who wants to search for specific information within a recording, or who cannot access video for technical reasons.
For audio-only content such as podcasts, a transcript is the primary accessibility requirement. WCAG 2.1 requires transcripts for pre-recorded audio-only content.
For a pre-recorded video with audio: you need captions. A transcript is also recommended as additional support.
For a live video event: you need live captions. Automatic live captions are available on most major platforms and are acceptable where manual captioning is not feasible, though human captions are more accurate.
For audio-only content (a podcast, a recorded presentation without video): you need a transcript. Captions are not applicable to audio-only content.
For a video that is silent or contains only background music with no speech: captions are not required, but a brief description of the visual content may be needed.
Most video platforms – YouTube, Vimeo, Microsoft Stream – automatically generate captions using speech recognition. These automated captions are a useful starting point but must always be reviewed and corrected before publishing. Accuracy rates for automated captions can be low, particularly for speakers with accents, technical terminology, or non-standard speech patterns. Uncorrected automated captions can be misleading or even offensive.
For transcripts, tools like Otter.ai or Descript can generate a rough text version of your audio that you can then edit. Building transcript creation into your content production workflow from the beginning is far easier than retrofitting it later.
Captions, subtitles, and transcripts are not optional extras. They are how you ensure that audio and video content is accessible to everyone – and in many contexts, they are a legal requirement.