Why we all need subtitles now

You couldn’t really clean up the audio.But with the rise of digital editing tools, dialog editors can clean up the audio, remove background noise, isolate words and even replicate dialogue if needed.

I watch a lot of movies and TV on the train, at home, at the movies, while working out, while doing dishes in the bath. But no matter where I’m watching, I find myself constantly doing this one thing. What? It turns out this isn’t unusual. We polled our YouTube audience and about 57% of people said that they feel like they can’t understand the dialogue in the things that they watch unless they’re using subtitles. But it feels like this hasn’t always been the case.

To figure out what was going on, I made a call to Austin Olivia Kendrick, a professional dialogue editor for film and TV. After talking to Austin for almost two hours, it became clear that this is a very layered and complex topic. Everything kept pointing back to one main thing: technology.

Microphones used to be big, bulky, and temperamental, and required creative solutions to be hidden. They were wired and recorded onto hard memory like wax and eventually tape. No matter how many actors were in a scene, all sound got recorded to one track. So performers had to be diligently focused and facing a certain angle so that their words could be picked up.

But technology has improved to the point where microphones don’t impede performance as much anymore. They become better, smaller, wireless, and we use more of them to ensure that performances get captured. We typically are working with two boom microphones and then every actor has at least one lavaliere microphone hidden somewhere on them.

These shrinking mics have given actors the flexibility to be more naturalistic in their performances. They no longer need to project so that their words reach the mic. They can speak softly, knowing that the tiny mic hidden on their body will pick up what they’re saying.

Digital editing tools have also allowed dialogue editors to clean up the audio, remove background noise, isolate words, and even replicate dialogue if needed. This has made it easier to make mumblers like Tom Hardy intelligible. If some piece of dialog is truly impossible to understand, actors will come in and rerecord those specific lines in a process called ADR (automated dialog replacement). This is something that still gets done today, but it can be costly because you are paying for the actors’ time, the engineer’s time, and the editor’s time. Therefore, it is important to do ADR as little as possible. A big part of the job is making words sound better. For example, if there is a loud metal clang that can’t be removed, an alternate take must be found that fits and then pushed in.

Once the ADR is done, it is sent off to a mixer who works to make sure the frequencies of the sound effects and music don’t overlap with the frequencies of the human voice. This is only possible now that the world has moved away from tape and into digital recordings. This is a big challenge, because carving out those frequencies to make sure the dialogue punches through and isn’t muddied up by any other sounds is difficult.

Even with all that work, lines of dialog can still be hard to understand. This is because a lot of people want their movie to feel “cinematic” and have wall-to-wall bombastic, loud sound. However, if you make something too loud, it will get distorted. So, to create a wide dynamic range, the quieter sounds must be pushed lower instead of pushing the louder sounds louder. This is why the films of Christopher Nolan, which have hard to hear dialogue, have been criticized and why he likes it that way.

When movies are mixed, they are usually mixed for the widest surround sound format available, Dolby Atmos, which has true 3D sound up to 128 channels. However, if you’re not at a movie theater that can showcase the best sound Hollywood has to offer, you can’t experience all of those channels. This is when downmixing is done, where all the same sounds live on one or two or five tracks. Downmixing is the process of taking a large mix and reducing it to formats with fewer channels, such as Atmos to 7.1, 7.1 to 5.1, or 5.1 to stereo or mono. This is particularly challenging when it comes to modern TVs, which are much thinner than their older counterparts and must fit small speakers into a sleek form factor. As a result, the downmixed version of a movie that goes from 128 channels to just two can sound muddier, and when combined with poor speakers, naturalistic mumbly performances, and a flattened mix, it can be difficult to understand what’s going on. To address this issue, TVs are now shipping with settings like active voice amplification and intelligence mode to try and make dialogue more audible. However, the industry has yet to revert back to pristine dialogue mixes. As such, the solutions are to buy better speakers, go to a theater with impeccable sound, or keep subtitles on. Laws have been passed to ensure that movie theaters have at least a few captioned screenings a week, and streaming services have standardized them. Subtitles are also easily accessible on YouTube and TikTok, and are now even more accessible due to speech recognition technology.