How Many Words for a 5-Minute YouTube Video Script?

A 5-minute YouTube video script is approximately 750 words at a typical 150 words-per-minute narration pace. The realistic range is 650 to 850 words depending on whether you pause for jokes, leave space for B-roll voiceover, or speak at a faster vlog-style pace. Most explainer-style YouTube videos at this length come in at 700 to 800 spoken words.

How we calculated it

YouTube narration pace is faster than in-person speaking pace because narrators read prepared scripts in a controlled environment without needing to read an audience. The platform-typical pace for tutorial and explainer channels is 150 to 160 wpm; vlog and lifestyle creators often push to 170 to 180 wpm. A 5-minute video at 170 wpm would call for about 850 words.

B-roll changes the math. If you plan to cut to footage, demos, or screen recordings while continuing to narrate, you keep the full 750-word target. If the B-roll plays without narration (a music-only product shot, an establishing scene), subtract that runtime from your script length. A typical 5-minute YouTube video has 30 to 60 seconds of unnarrated visuals, trimming the spoken-word target to 600 to 700.

Hook structure matters more at this length than the script length itself. The first 15 seconds (about 35 to 40 words) determine whether viewers stay or click away. Most successful YouTube creators at this length spend 30 to 50 percent of their writing time on the first 50 words of the script.

A worked example: the explainer-channel video

You run an explainer-style YouTube channel and are scripting a 5-minute video on a single technical concept aimed at intermediate viewers. Your average watch time on similar videos is 3:40. You want a script tight enough to hold viewers all the way through.

A 700-word script (deliberately under the 750-word ceiling) maps cleanly onto a 5-minute video at 150 wpm narration. Allocate: 60 words to a 12-second hook that names the specific question viewers will be able to answer by the end (this is what determines whether they stay past the first cliff). 100 words for context — why this concept matters, what it solves. 380 words for the core explanation, broken into 3 clear sub-points with a specific example for each. 100 words for the most common point of confusion ("you might be wondering..."). 60 words for the close — a single takeaway plus a specific call to action.

The hook is the single most important structural decision in a 5-minute YouTube script. Industry retention data consistently shows the largest viewer drop-off in the first 15 seconds, which is roughly the first 35 to 40 narrated words. If your hook is generic ("today we're going to talk about..."), retention will lag every other metric in the video. Spend disproportionate writing time on those opening 40 words.

How B-roll changes the script length

B-roll — supplementary visual footage that plays under or alongside narration — is treated differently depending on whether it is voiceover-paired or silent. Voiceover-paired B-roll (the narrator continues talking while different footage plays) does not change script length at all; the 750-word target stays. Silent B-roll (footage plays without narration) directly subtracts from the script length — every 10 seconds of silent B-roll removes roughly 25 words from the spoken target.

Most explainer channels use a mix. A typical 5-minute explainer has 30 to 45 seconds of silent or near-silent B-roll across the full video, which trims the spoken-word target from 750 to about 640 to 680. Tutorial channels with screen recordings often run closer to the full 750 because the narrator usually keeps talking through every demo step.

Why YouTube narration runs faster than in-person presentation

In-person presentations average 130 wpm because the speaker is reading the room — adjusting pace based on audience cues, building in deliberate pauses for emphasis, navigating distractions. YouTube narration averages 150 to 170 wpm because none of that applies; the narrator is reading prepared material in a controlled environment with no live feedback loop. The faster pace is also what viewers expect — a 130-wpm YouTube video typically registers as "slow" or "draggy" to viewers who calibrate against the platform norm.

The trade-off is that faster narration lowers comprehension on the first watch. YouTube partially compensates with auto-captions and replay-friendly architecture (viewers expect to scrub backward). For complex technical content, narration in the 140 to 150 wpm range often outperforms 170 wpm despite feeling slower, because comprehension on first watch correlates strongly with retention to the end.

Common pitfalls to avoid

  • A generic opening. Hooks like "Hey everyone, welcome back to the channel, today we're going to be talking about..." consistently kill retention in the first 15 seconds. Almost any specific opening (a question, a contrarian claim, a one-line story) outperforms the generic greeting. Save the channel branding for after the hook lands.
  • Burying the answer. Long-form YouTube can get away with delayed payoffs; 5-minute YouTube cannot. If a viewer cannot identify the central answer or insight by the 90-second mark (~225 words in), retention drops sharply. State the headline near the top, then spend the rest of the video earning the right to have stated it.
  • Treating the script as final. A script that reads well silently often runs 10 to 15 percent longer when spoken aloud, and several phrases that look fine on paper turn out to be tongue-twisters in delivery. Always read the full script aloud before recording — and time it. The first read-through almost always identifies 30 to 50 words that need cutting or rewording.

Count your own words

Paste your draft into the free word counter to see exactly how many words you have written, plus character count, reading time, and speaking time. The tool runs entirely in your browser — your text is never uploaded.

Frequently asked questions

How many words for a 5-minute video voiceover with no on-camera time?
Voiceover-only 5-minute videos can support up to 800 words because there is no on-camera transition time, no greeting, and no visual setup. Animation-driven explainer videos often run at exactly this density.
What pace does YouTube's auto-caption system handle best?
Auto-captions are most accurate at the 140 to 160 wpm range, where most explainer channels naturally narrate. Above 180 wpm caption accuracy degrades noticeably; below 120 wpm the captions often add false sentence breaks at long pauses.
Should I write the script before or after planning the visuals?
For tutorials and screen-recording content, write the script first and let the visuals follow. For B-roll-heavy content (vlogs, video essays, documentary-style explainers), plan the visual sequence first and write the narration to fit. The wrong order in either case leads to either over-narration or under-narration.
How many words per minute do YouTubers speak?
Most channels run 150 to 170 words per minute. Tutorial and explainer videos sit at the lower end (150-155); vlog and entertainment channels often reach 170-180.
Should I write a full script or use bullet points?
Full scripts produce tighter, denser videos but can sound robotic if read flat. Bullet points produce more natural delivery but typically run 20-30 percent longer than planned. Most creators land on a hybrid: scripted hook and outro, bullet-point middle.
How long is a 1000-word video script?
About 6 minutes 40 seconds at 150 wpm, or 6 minutes flat at 170 wpm. To fit 1,000 words into exactly 5 minutes you would need to narrate at 200 wpm — uncomfortably fast for most viewers.

Related word counts

More in Video & Podcast Scripts

Script-length targets for YouTube, TikTok, Reels, Shorts, and podcast formats — based on typical narration pace.

Last reviewed: May 2026. Word-count guidelines are based on the standard 130 wpm speaking pace, 150 wpm narration pace, and 250 wpm silent reading pace; adjust to your own delivery for best accuracy.