Captions – automatic, closed captions, real-time, transcription: What do these all mean?

Automatic captions – Also referred to as speech-recognition, automated captioning, or auto-captions, are generated by a computer with Automatic Speech Recognition (ASR) technology. These captions tend to lack punctuation, speaker identification, and require a human to fix mistakes.

Many platforms include this feature, such as:

Video streaming platforms (e.g. YouTube automated captions or Microsoft PowerPoint Translator)

Apps (e.g., Translate or Otter.ai)

Learning Management Systems (e.g., Blackboard, Canvas)

Live video streaming services (e.g., Google Meet)

Captions – Also referred to as open/closed captions or subtitles. These are captions for pre-recorded video content that are time-synced and embedded into the media. Accurate and edited captions provide equivalent access. Captions also provide auditory information that ASR technology may not be able to identify.

Real-time captioning – Also referred to as live captioning or speech-to-text services. This service is provided by a qualified speech-to-text professional. Examples: Live captioning for news broadcasts or by a third-party vendor streamed into Blackboard for a synchronous online class.

Transcribe/Transcription – Also referred to as a transcript. This process involves converting audio into a plain text document. Transcripts are commonly used for stand-alone audio, such as podcasts or presentations without video. They are also used as the first step towards creating captions for media. Transcripts can be auto-generated using ASR or by speech-to-text professionals.