In Praise of Software Scribes

Image: A Scribe and a Woman by Rudolf von Ems (about 1400–1410)/The J. Paul Getty Museum

This post started with me, on my balcony, in -7 degrees Celsius. (19 Fahrenheit.) I was talking to my scribe, the Otter app on my phone. Thousands of years ago, scribes were a luxury afforded only to royalty, although I suspect a Pharoah would not have tolerated such cold weather. I went back inside, exported the text, and put it into a document. Then, I wrote. 

There’s plenty to be said of the overarching business opportunity here for transcriptions. Otter raised $10 million in January 2020. Over a decade ago, Rev.com raised millions too. I hope to write about that in a business context one day, but I won’t be doing that in this post. Instead, I’ll look into the opportunity of turning audio and video into writing and my personal experiences with it as a professional author and editorial director. 

Enter YouTube’s Transcriptions

Perhaps one of the most useful, and least obvious, tools in YouTube is it also displays its captions in transcript format. I’d consider myself a pretty heavy user of YouTube, and I used it for years without realizing this was possible. Just click the “…” button under a video’s title, click “Open transcript,” and voila.

Given the transcript’s imprecision, I’d guess that it was machine-generated, and YouTube hasn’t done it for every video yet—only the more recent ones. It’s an incredibly useful tool, especially as an author—I can’t count the number of times I wanted to quote someone from a specific video, but skipped it because I didn’t have the time to listen to the whole episode again. I eventually started writing down timestamps, but this is much more convenient. 

Transcripts Encourage Sharing

In some sense, transcripts are an affordance to share. When an author quotes someone, I’m sure a fair amount will link back to the source. This must also be the reason some podcasters go the extra mile and offer transcriptions of their episodes (which are much appreciated). 

One of the most powerful properties of text is how easy it is for engines to index and search. I imagine there’s still a lot of valuable information and knowledge locked away in other formats—video, audio, and such—and transcriptions are really the only way to unlock it. 

Interface for Transcripts

Another opportunity is the interface for transcribed text. So in YouTube’s case, the transcription is in a little box on the top right corner. Different programmers have figured out different ways to make this better. For example, Hieroglyph displays YouTube transcriptions in a full-width interface which, thankfully, makes it easier to read. But it’s just all one big block of text, which can be difficult for sustained reading; I can only skim it, and I use it usually for pulling information (e.g., Command+F). You-tldr is probably better—it has time stamps, its text is more legible.

Even though YouTube does it for everyone, I still think we’re in the early days of this opportunity. Right now the process is clunky—for example, with some podcasts, I need to find an MP3 file, download it, upload it to Otter (which I pay for), and wait. I’m sure eventually there will be a tool that makes this more seamless for everything outside of YouTube. 

Video-First Expression

If you’re in the business of communicating ideas (which is everyone in leadership, marketing, internal comms, journalism, etc.), the most flexible form to first record that content is actually in video. It’s the hardest highest fidelity format to procure, so to speak, in that you can’t turn audio or text into video without relatively expensive processing—in expertise or in time. (Good song lyric videos are an example of this.) You can, however, turn a video into writing and audio without much more aggressive processing. 

For interviews and podcasts, the most flexible workflow is to record the video interview (and publish at YouTube), turn it into audio (and publish as a podcast at all channels), and turn that into a transcribed text interview (and publish at a blog and Medium), perhaps summarizing and wrapping more details into it (and publish it at a publication). You can also put more social networks in there—say, turning on Clubhouse while you record, as well as converting the complete text interview into a summary Tweetstorm. 

Software scribes free authors up to think out loud, to get through a writer’s block, and to actually work on the piece and refine their writing and thinking. While I didn’t dive into it on the piece, transcripts also unlock information for a lot of people with hearing impairments, and thus also unlock their thoughts and contributions. The more transcriptions happen, the better information will “be unlocked” and spread through the internet. 

Unlock the Best Books Over Email

I read a lot of books for my work, and I recommend the best books every month. Sign up for the Best of Books reading newsletter, where I send great books to your inbox. As a bonus, you’ll instantly receive my best articles on creativity (read by 300,000 others). See you around!

Leave a Reply

Your email address will not be published. Required fields are marked *