It Is Easy To Make Videos Sound Bad
From the department of understatement: there is more video being produced today than ever before, and more ways to watch video than ever before.
Too often, audio captured with a camera’s microphone is subpar and not fully repairable.
Houses of worship are also using and producing much more video today than in the past, both internally as part of services, for education, as well as part of outreach to distant participants.
As with any perusal of sites, such as YouTube will demonstrate, production values and audio quality, varies widely from the content that is available.
It’s one thing for a teenager to create a video in his or her bedroom, and quite another to capture a live event in a large, crowded space with multiple sources. Too often, we see these attempts fall short of what the creators intended. Low-quality audio is a big part of what fails.
Among the most common causes for poor quality audio are:
- Microphones being too far from subjects
- Excessive background noise and echo capture
- Poor audio editing
While most video cameras have self-contained microphones, this often places them too far from subjects to provide quality sound. As a microphone is moved further and further from a subject, the volume level of the subject decreases rapidly, and is quickly buried by external sounds.
The human ear is remarkably good at disentangling a particular signal from noise, when listening to speech, but microphones flatten the soundscape and remove many of those cues that our ears use, making verbal communication a chore to understand in many cases. Too often, audio captured with a camera’s microphone is subpar and not fully repairable.
Background noise is strongly correlated with microphone distance from subjects, as the percentage of direct sound in the signal decreases with distance. Ambient and reflected sound rapidly dominate the signal, degrading intelligibility in a way that is often not addressable with tools after the fact.
Audio editing can help intelligibility, but this requires skill, practice and tools. Simply capturing and playing back audio without modification to the signal, gives surprisingly poor results in many cases, as content is often played back on devices of small size, limited volume capability and an even more limited frequency response, devoid of any low end.
Unmodified audio will often sound hopelessly low-volume and difficult to understand in all but the quietest environments.
So, we know some of the problems. What are some practical solutions?
Capturing Quality Audio
The first rule of good audio capture is to get your microphone(s) as close to your subject as possible. If your subject is a group, you may need multiple microphones to adequately capture each participant, which in turn speaks to using a mixer or even doing multitrack recording.
Microphones used for speakers should be capable of good noise and feedback rejection, and speakers should learn to be mindful of addressing microphones effectively, e.g., not backing far away from the mic while speaking. Lavalier microphones attached to clothing or headworn mics are often a good choice for speakers who frequently move.
A mixer is certainly required to handle multiple microphones for live sound, but when recording, you may have the superior option of capturing each microphone as a separate track. This allows the editor to adjust for each participant as needed, as opposed to the “fixed” 2-channel mix used to feed loudspeakers at the event.
Capturing individual microphones was an expensive option in the days of analog, but with modern audio-over-IP systems like Dante, it is easy and nondisruptive. A networked stage box or wireless base station is generally used to capture and place microphone signals on the network, so that they can be routed to as many other devices as needed. The same signals that are sent to a console for a live mix are simultaneously sent to a computer running networked audio software, such as Dante Virtual Soundcard. Such software allows all channels to be captured as individual tracks in any common digital audio workstation, or DAW, product.
Once captured, editors quickly discover that video and audio tracks recorded on separate devices are not synchronized, and often are not quite the same duration, due to small clock differences that accumulate over the length over each recording. This is an age-old problem in video, as evidenced by the use of the slate “clapper” in movies for more than a century.
The clapper, or its equivalent, is still used today, providing an easily identified audio cue that can be used to align audio and video at the beginning of a take. For long recordings, this single cue point may not be sufficient as the tracks will slowly drift apart, requiring the editor to make further adjustments along the timeline, to keep words and mouths in sync. A common practice is to use a separate audio track captured by the camera’s microphone as a reference; it may not sound very good, but it is in sync with the video.
Among the most vital audio choices that an editor can make involve equalization and level compression. Equalization (EQ) can be used to accentuate areas of speech critical to intelligibility, and to reduce ranges in which there is useless noise or unwanted effects.
Compression adjusts levels automatically to increase quiet sounds and decrease very loud ones.
When a subject is speaking very close to a microphone, proximity effects often result in excessive low frequency content, making voices sound boomy and indistinct. Judicious reduction of low frequencies can often increase intelligibility and renders the content more useful on smaller speakers (such as laptops) that might otherwise be overloaded with bass. Likewise, adding small amounts of boost to treble regions (i.e., 2 kHz and above) can help percussive parts of speech to be more easily heard, especially over background noise.
Level compression is used in nearly 100 percent of music recordings, but is not always well understood. Roughly speaking, a compressor attempts to keep sound at a more constant level, processing incoming signals such that quieter sounds are made louder and louder sounds are made quieter. The degree to and manner in which a compressor changes these levels is of course infinitely adjustable, and countless hardware and software products exist to perform this function, each tailored to different needs and tastes.
The importance and relevance of level compression in modern audio cannot be understated; it is used in all music recordings to make songs “pop” at reasonable volume levels, and it is used by broadcasters to ensure that they are sending out a nearly full-scale signal all the time. Without compression, the radio in your car would be, for example, far more easily buried under the constant road noise. Compression allows average levels to be kept high while preventing peaks from overloading amplifiers and speakers.
The very same attributes that make compression a logical choice for radio applies to audio used as part of video. Without compression, it is very likely that much speech will not easily be heard through small speakers or in noisy environments. Applying audio compression during the video editing process is key to recordings, that come across powerfully on all manners of devices.
The Final Cut
As AV-over-IP continues to advance into everyday reality, workflows for video production will change.
Audinate has recently announced Dante AV, a product that allows manufacturers to build products that transport audio and video over a common network using a common clock, eliminating earlier problems of clock drift.
A Dante AV-based recording system will enable a constant time base to be used throughout, even across, multiple devices. And because Dante audio is so widely used, Dante AV brings immediate interoperability with thousands of existing audio devices and software products.
So, keep those microphones close, keep the background noise down, pay attention to lip sync and listen carefully to how proper EQ and compression can make videos “pop” - even on all our tiny devices.