Screen Recording Glossary — Mac & Video Terms Defined

Automatic zoom: A screen-recording technique that pushes the framing closer to a specific area of the screen — typically wherever the user just clicked — and then pulls back once the action completes. Automatic zoom eliminates the need to manually keyframe each zoom in a video editor. The result reads more like an edited tutorial than a raw screen capture; the viewer's eye is guided to the active region instead of scanning the whole screen. See also why screen recordings need automatic zoom.
On-device transcription: Transcription that runs entirely on the user's machine, without sending audio to a remote server. On modern macOS this is implemented using Apple's Speech framework — the same engine behind macOS dictation and Live Captions. On-device transcription preserves privacy, works offline, and avoids the per-minute cost of cloud transcription APIs. Typical outputs include a written transcript, a .srt subtitle sidecar, and burnt-in captions on the video. See also on-device captions on macOS.
Multi-track capture: Recording each input layer — screen, webcam, microphone, system audio — onto a separate, individually editable track inside the same project. Multi-track capture lets a creator rebalance audio, swap or hide the webcam, or remove the system-audio track after the fact, without re-recording. The alternative — flat single-track capture — is faster to write but locks every decision into the moment of recording.
Cursor smoothing: Post-processing that takes the raw, jittery mouse path the user actually drew and replaces it with a clean, deliberate path arriving at the same coordinates. Cursor smoothing makes a tutorial look like the recorder knew exactly where they were going. The smoothing is applied at render time; the underlying click coordinates are not altered, so timing-sensitive interactions still land where they were supposed to.
Teleprompter mode: A scrolling-script overlay shown only to the person recording, not captured in the output video. On macOS the standard implementation uses the system's window-sharing exclusion API so that even if the recorder captures the full screen, the prompter window is filtered out. The result is a single clean take with the speaker reading naturally instead of trying to remember the script. See also a built-in teleprompter is the missing upgrade.
Privacy mask: A region drawn over recorded footage that hides whatever is underneath — a blur, a solid block, or a selective redaction. The mask follows the content beneath it as the screen scrolls or the window moves, so sensitive data (API tokens, customer names, billing info) stays hidden even when the layout shifts mid-recording. The opposite — masking the screen during capture rather than during render — destroys the source pixels and can't be undone.
Keystroke overlay: An on-screen display showing the keyboard shortcuts the user pressed during a recording. Keystroke overlays are most useful in tutorials demonstrating a specific keybinding (an editor command, a workspace switch, a niche hotkey), where saying "I press cmd-shift-K" out loud isn't as fast as showing the keys lighting up.
Webcam background removal: Automatic separation of a webcam's foreground subject from its background — without a green screen — using machine-learning segmentation built into the operating system or recording app. The technique allows a circular or shape-masked webcam overlay to sit on top of a screen recording without revealing the room behind the speaker. On macOS, the underlying segmentation is exposed via Apple's Vision framework.
Burnt-in captions vs sidecar subtitles: Burnt-in captions are rendered directly onto the video pixels and cannot be turned off by the viewer; sidecar subtitles (typically .srt or .vtt) are a separate file the player can show or hide. Burnt-in captions are essential for platforms (TikTok, Reels, Shorts) that ignore sidecar files; sidecar subtitles are essential for accessibility, search indexing on YouTube, and translation pipelines. A well-built recorder produces both from a single transcription pass.
Non-destructive editor: A video editor whose preview and final export share the same render path — so what the user sees while editing is exactly what the export produces. "Non-destructive" specifically means the original recording file is never overwritten; every edit is a layer on top, and the source can be reverted to at any time. The opposite — destructive editing — re-encodes the source on every save and accumulates compression loss.
Local-first screen recording: A design stance in which recordings, transcripts, edits and exports stay on the user's machine by default — no automatic upload, no account required to record. The opposite of cloud-first recorders, which upload while recording and host the resulting file on the vendor's servers. Local-first has privacy and offline benefits; cloud-first has team-sharing and viewer-analytics benefits. See also the case for local-first screen recording.