I remember how eager I was to get into music production. The arrangement possibilities were endless, and I could learn how to make mixes sound like what I heard. Unfortunately, in the chaos of beginning to produce, I didn’t learn the basics of how a computer actually handles audio, so the whole concept of making music on a laptop felt a bit abstract.
Even bouncing my first track was confusing. What does each of the options do? How was I supposed to know what would sound best?
In this article, we’ll cover some basic aspects of digital audio, and how they affect the production process. Today, we’ll focus on sample rate and bit depth, as well as a few topics related to them. It’s a bit of theory and a bit of math, but hopefully it will peel away some of the mystery behind how digital audio works.
What is “digital audio” in the first place?
Digital audio is the system in which we store, recreate, and manipulate audio information in a computer system. Certain characteristics of an analog sound wave, like the frequency and amplitude, are converted to data computer software can read. This allows us to manage, edit, and arrange audio in a software-based context.
The sound wave is converted into data through a series of snapshot measurements, or samples. A sample is taken at a particular time in the audio wave, recording amplitude. This information is then converted into digestible, binary data.
The system makes thousands of measurements per second. If we can take tons of measurements extremely quickly with enough possible amplitude values, we can effectively use these snapshots to reconstruct the resolution and complexity of an analog wave.
The system takes these measurements at a speed called the sample rate, measured in kilohertz. In most DAWs, you’ll find an adjustable sample rate in your audio preferences. This controls the sample rate for audio in your project.
The options you see in the average DAW—44.1 kHz, 48 kHz—may seem a bit random, but they aren’t! The sample rate determines the range of frequencies captured in digital audio. Let’s use a sine wave to demonstrate:
To measure the frequency of this sine wave, we need to be able to detect and define one cycle. One complete cycle of any wave contains a positive and negative stage. To know the length of this cycle—the wavelength, which leads us to the wave’s frequency—we need to detect both of these two stages. Therefore, we need to measure the wave at least two times per full cycle to accurately capture its frequency.
This means we can capture and reconstruct the original sine wave’s frequency with a sample rate at least twice its frequency, a rate called the Nyquist rate. Conversely, a system can capture and recreate frequencies up to half the sample rate, a limit called the Nyquist frequency.
Signal above the Nyquist frequency is not recorded properly by audio-to-digital converters (ADCs), becoming mirrored back across the Nyquist frequency and introducing artificial frequencies in a process called aliasing.
To prevent aliasing, audio-to-digital converters are often preceded by low-pass filters that eliminate frequencies above the Nyquist frequency before audio reaches the converter. This will prevent unwanted super high frequencies in the original audio from causing aliasing. Early filters could taint the audio, but this problem is being minimized as better technology is introduced.
Standard sample rate: 44.1 kHz
The most common sample rate you’ll see is 44.1 kHz, or 44,100 samples per second. This is the standard for most consumer audio, used for formats like CDs.
This is not an arbitrary number. Humans can hear frequencies between 20 Hz and 20 kHz. Most people lose their ability to hear upper frequencies over the course of their lives and can only hear frequencies up to 15 kHz–18 kHz. However, this “20-to-20” rule is still accepted as the standard range for everything we could hear.
The computer should be able to recreate waves with frequencies up to 20 kHz in order to preserve everything we can hear. Therefore, a sample rate of 40 kHz should technically do the trick, right?
This is true, but you need a pretty powerful—and at one time, expensive—low-pass filter to prevent audible aliasing. The sample rate of 44.1 kHz technically allows for audio at frequencies up to 22.05 kHz to be recorded. By placing the Nyquist frequency outside of our hearing range, we can use more moderate filters to eliminate aliasing without much audible effect.
Other sample rates: 48 kHz, 88.2 kHz, 96 kHz, etc.
While 44.1 kHz is an acceptable sample rate for consumer audio, there are instances in which higher sample rates are used. Some were introduced during the early days of digital audio when powerful anti-aliasing filters were expensive. Moving the Nyquist frequency even higher allows us to place the filter further and further out of human hearing, and therefore impact the audio even less.
48 kHz is another common sample rate. The higher sample rate technically leads to more measurements per second and a closer recreation of the original audio, so 48 kHz is often used in “professional audio” contexts more than music contexts. For instance, it’s the standard sample rate in audio for video. This sample rate moves the Nyquist frequency to around 24 kHz, giving further buffer room before filtering is needed.
Some engineers choose to work in even higher sample rates, which tend to be multiples of either 44.1 kHz or 48 kHz. Sample rates of 88.2 kHz, 96 kHz, 176.4 kHz, and 192 kHz result in higher Nyquist frequencies, meaning supersonic frequencies can be recorded and recreated. Low pass filters have less impact on the sound and more samples per second, which results in a more high-definition recreation of the original audio.
Can you really hear this though?
Some experienced engineers may be able to hear differences between sample rates. However, as filtering and analog/digital conversion technologies improve, it becomes more difficult to hear these differences.
In theory, it’s not a bad idea to work in a higher sample rate, like 176.4 kHz or 192 kHz. The files will be larger, but it can be nice to maximize the sound quality until the final bounce. In the end, however, the audio will likely be converted to either 44.1 kHz or 48 kHz. It is mathematically much easier to convert 88.2 to 44.1 and 96 to 48, so it’s best to stay in one format for the whole project. However, a common practice is to work in 44.1 kHz or 48 kHz.
If the system was set to a sample rate of 48 kHz and we used a 44.1 kHz audio file, the system would read the samples faster than it should. As a result, the audio would sound sped up and slightly higher-pitched. The inverse happens if the system sample rate is on the 44.1 kHz scale and audio files are on the 48 kHz scale; audio sounds slowed down and slightly lower-pitched.
Super-high sample rates also have an interesting creative use. If you’ve ever lowered the pitch of a standard 44.1 kHz audio file, you’ve probably noticed the highs become somewhat empty. Frequencies above 22.05 kHz were filtered out before conversion, so there is no frequency content to pitch down, resulting in a gaping hole in the highs.
However, if this audio were recorded at 192 kHz, for example, frequencies of up to 96 kHz in the original audio would be recorded. This is obviously way outside of what humans can hear, but pitching the audio down causes these inaudible frequencies to become audible. As a result, you can greatly drop a recording’s pitch while preserving high-frequency content. For more information on sample rate, be sure to check this video out.
Analog audio is a continuous wave, with an effectively infinite number of possible amplitude values. However, to measure this wave in digital audio, we need to define the wave’s amplitude as a finite value each time we sample it.
The bit depth determines the number of possible amplitude values we can record for each sample. The most common bit depths are 16-bit, 24-bit, and 32-bit. Each is a binary term, representing a number of possible values. Systems of higher bit depths are able to express more possible values:
With a higher bit depth—and therefore a higher resolution—more amplitude values are available for us to record. As a result, the continuous analog wave’s exact amplitude is closer to an available value when sampled. Therefore, a digital approximation of the amplitude becomes closer to the original fluid analog wave.
65,536 amp. values
16,777,216 amp. values
4,294,967,296 amp. values
Increasing the bit depth, along with increasing the sample rate, creates more total points to reconstruct the analog wave.
However, the fluid analog wave does not always perfectly line up with a possible value, regardless of the resolution. As a result, the last bit in the data denoting the amplitude is rounded to either 0 or 1, in a process called quantization. As a result, there is an essentially randomized part of the signal.
In digital audio, we hear this randomization as a low white noise, which we call the noise floor. Like the mechanical noise introduced in an analog context or background noise in a live acoustic setting, digital quantization error introduces noise into our audio.
Harmonic relationships between the sample rate and audio, along with the bit depth, can cause certain patterns in quantization. This is known as correlated noise, which we hear as resonances in the noise floor at certain frequencies. Here, our noise floor is actually higher, taking up potential amplitude values for a recorded signal.
However, we can perform artificial randomization to make sure these patterns don’t occur. In a process called dithering, we can randomize this last bit gets rounded. Patterns are not created, creating more randomized “uncorrelated noise” that leaves more potential amplitude values.
The amplitude of the noise floor becomes the bottom of our possible dynamic range. On the other side of the spectrum, a digital system can distort if the amplitude is too high when a signal exceeds the maximum value the binary system can create. This level is referred to as 0 dBFS.
In the end, our bit depth determines the number of possible amplitude values between the noise floor and 0 dBFS.
You may be thinking, “Can human ears really tell the difference between 65,536 and 4,294,967,296 amplitude levels?”
This is a valid question. The noise floor, even in a 16-bit system, is incredibly low. Unless you need more than 96 dB of effective dynamic range, 16-bit is viable for the final bounce of a project.
However, while working on a project, it’s not a bad idea to work with a higher bit depth. Because the noise floor drops, you essentially have more room before distortion occurs—also known as headroom. Having this extra buffer space before distortion is a good failsafe while working and provides more flexibility.
For more information on bit depth, be sure to check this video out on Pro Audio Essentials.
With a firmer understanding of sample rate and bit depth, it’s clear to see how lucky we are to live in this age of audio engineering. Digital audio allows us myriad possibilities for manipulating audio, many of which were not available in analog systems.
Additionally, improving technologies over the years have helped to eliminate many problems introduced in a digital system. Technologies continue to evolve, making it possible for digital audio to be totally indistinguishable from its analog counterpart.