Sonic Boom

Does Upsampling Improve Sound Quality?

As the website’s name suggests I always had a certain interest in personal audio going beyond 44.1/16. In a much wider sense “beyond red book” could also be understood as an approach to anything in digital personal audio that is not necessarily bound to the standards and physical limitations of CD audio. This wider interpretation then especially includes streaming in all of its variants and derivatives.

This article looks into the fundamental basics of digital music (re)production in a home/personal environment. Reading these lines will probably enable you to understand the underlying concepts of digital audio (re)production. That being said it will not necessarily lead to a point where it’s crystal clear whether upsampling makes sense or not. Based on the subject’s nature there will be room for interpretation even though some aspects are hard facts/data.

There are strong desciples on both ends of the spectrum. Some will insist on 44.1/16 being sufficient and everything above is vodoo, unnecessary ballast or even contra-productive. Others will come to the conclusion that a certain upside potential can be derived from upsampling in the context of personal home audio.

To get into the search for an answer whether it’s a good idea to upsample or not, the boundaries of the human hearing can be used as a starting point.

Human Hearing Capabilities

The human auditory field is determined in the range of 20 Hz to 20 kHz based on the physiology of our ears and the auditory cortex in our brains. Below 20 Hz we speak about infrasounds while everything above 20 kHz is labeled as ultrasounds. With these specific hearing capabilities mankind is positioned somewhere in between elephants, moles as well as cats and dogs.

A lot of scientific research and testing has been undertaken in the last 100 years to get to these authoritative data. The range of 20 Hz to 20 kHz is an ample definition. Even for most healthy young people the range of what is actually perceivable for them is (considerably) smaller. These limitations grow with further aging. Irrespective of that the perception of different frequencies is correlated with volume. Some frequencies can be heard at a lower amplitude while others must be extremely loud to be even recognized.

The threshold of hearing on the lower end as well as the threshold of pain on the upper end define the human auditory field within the given frequency range.

Long story short: Human hearing is happening with frequencies in between 20 Hz and 20 kHz and sound pressure levels of 0 dB up to 140 dB. This determination is important. When speaking about optimization based upon upsampling it needs to bring forth any measurable (and distinguishable) effect within this scope. There are no super humans. No one can go beyond these limits. There are certainly highly talented/trained/gifted individuals out there but all of them are bound to the limits of human hearing.

Sampling & Nyquist-Shannon-Theorem

When speaking about Upsampling we already imply that we are operating in a digital domain. To get there we need to transform our initially analog, continuous signal (the sound wave) into a discrete, digital representation of it. This process of analog-digital conversion is known by the term sampling. It usually is a two step process consisting of discretization and quantization. The intended result of that signal processing is an exact representation of the original analog sound wave in a digital data set.

In the end it should be possible to retrieve the original analog input from a sequence of the digital samples. If done right this back and forth analog-digital-analog conversion is completely lossless. To ensure lossless transition a few prerequisites need to be met.

  1. The original continuous-time signal needs to have a finite bandwidth. The signal is bandlimited. This is true for our specific use case. We’re speaking about the reproduction of sound waves within the spectrum of human hearing (20 Hz to 20 kHz).
  2. The Sampling-Theorem by Nyquist and Shannon is fulfilled. The Nyquist-Shannon-Theorem is the fundamental bridge between continuous-time signals and discrete-time signals. It basically says that the sampling frequency must be greater than twice the maximum frequency you want to reproduce. If you want to capture frequencies up to 20 kHz the sampling rate has to be higher than 40 kHz.

If a function x(t) contains no frequencies higher than B hertz, it is completely determined by giving its ordinates at a series of points spaced 1/(2B) seconds apart.

A sufficient sample-rate is therefore anything larger than 2B samples per second. Equivalently, for a given sample rate f{s}, perfect reconstruction is guaranteed possible for a bandlimit B<f{s}/2.
he

It is crucial to internalize this basic concept. Understood correctly it frees us from the preconception that more is more in sampling. The core aspect here is that each and every part of the acustic spectrum of human hearing can be captured perfectly correct if sampled with twice its frequency. Two data points (samples) per wave cycle are fully sufficient.

Don’t get fooled by the idea that more samples will lead to better quality and/or fidelity. Therefore it is also not true that it’s “easier” to reproduce lower frequencies with more accuracy based on the fact that these frequencies are captured with more data points. On the other hand sampling is not getting “thin”, poor or low-quality on the upper end when the frequency of the original wave is close to the Nyquist frequency.

The 20 kHz example is just as “accurately” sampled with 44.1 kHz as the 440 Hz signal is. In both cases there is just one (!) unique wave that can be reproduced with the given data. Get rid of the idea that an angular graph is synonymous with a poor representation of the original sound wave. It is just a possible way of visualization that has nothing to do with the factual sound quality.

Under-Sampling & Aliasing

As long as a single wave cycle is represented by more than two samples the world of digital audio reproduction is a happy place. Things change when we get to or even below the Nyquist rate. If we would sample a 4,000 Hz sound wave with a sample rate of 8,000 Hz, B is exactly 2. As a result we would hear: Nothing. The digital representation of the analog sine wave is a flat line.

If B drops below 2 we speak of under-sampling. A sound wave is captured by less than two samples leading to a situation where the data is not unambiguous any more. Aliasing occurs. Digital data is no longer uniquely related to the original analog sound wave. Whenever the continuous signal’s frequency is above the Nyquist rate, aliasing changes the frequency into something that can be represented in the sampled data – not necessarily the original sine wave. This leads to distortion reducing the sound quality of the digital signal.

The only way to avoid data corruption through aliasing is to make sure that a digital signal cannot contain frequencies above one-half the sampling rate. When the frequency of the continuous wave is below the Nyquist rate, the frequency of the sampled data is a unique match. Distortion is avoided. To make sure the Nyquist sampling theorem is satisfied filters are applied in the analog-digital-conversion process. For example in 44.1/16 music (re)production the Nyquist frequency is 22,050 Hz. To guarantee precise sound quality and to prevent under-sampling the anti-aliasing-filter needs to suppress frequencies above 22,050 Hz.

Why is 44.1 kHz a common Sampling Frequency?

Audio production in CD quality is done in 44.1/16. From what we’ve looked at so far it would have made perfect sense to go for a sampling frequency of 40,000 Hz as we can “only” hear up to 20 kHz and we need at least twice the sampling rate to avoid aliasing. There is of course a good reason why 40 kHz is not sufficient and 44.1 kHz got widely spread.

Unfortunately it is impossible to create a low-pass filter that perfectly passes through all frequencies below the threshold of 20 kHz and cut-off everything else above without some kind of attenuation. Due to its acausal nature there is no cutoff-filter that operates at a single given frequency without further attenuation. Therefore filtering is happening within a specific frequency range. This is where 44.1 kHz comes in. To ensure proper sound quality up to 20 kHz an additional transition band is used to implement low-pass-filtering with little or at least acceptable attentuation in the range around human-audible 20 kHz. The additional 2.05 kHZ ((44.1 – 40)/2) beyond 20 kHz are used for low-pass-filtering.

In addition to that 44.1 kHz has a distinct footprint in audio and video production history. It reached wider distribution by the end of the 1970s when Sony enabled us to encode audio in the context of video-cassette-recording. This approach converged into the Compact Disc Digital Audio standard which is defined in the now famous Red Book. With the release of its first edition Sony in collaboration with Phillips laid the foundation for 2 channel LPCM audio sampled at 44,100 Hz with 16 bit values. The VCR legacy can also be found in a strong compatability with PAL and NTSC video standards and numeric values. For example 294 active lines in PAL at 50 Hz and 3 samples per line result in a sampling rate of 44,100 Hz (294 * 50 * 3).

A Brief Summary

  • Human hearing is happening in between 20 Hz to 20 kHz.
  • A continuous-time (analog) sine wave can be converted lossless, unambiguous and therefore reversibly into a discrete-time (digital) signal.
  • To do so the Nyquist sampling criterion needs to be satisfied. Any wave cycle of the analog signal needs to be captured with more than 2 samples.
  • Under-sampling (B < 2) leads to ambiguous conversion and thus creates distortion through aliasing.
  • Aliasing can be avoided by limiting the frequency in the analog domain. Low-pass-filters are meant to keep the maximum frequency below the Nyquist frequency. For CD quality the Nyquist frequency is 22,050 Hz.
  • Due to the acausual nature of low-pass-filters an additional transition band (20-22,05 kHz) is needed to minimize attenuation in the hearable frequencies below/around 20 kHz.

Does Upsampling Improve Sound Quality?

So far so good. By now we understand why 44.1/16 is basically a good choice for analog-digital-conversion in the context of home audio. We still haven’t looked into the question whether it makes sense or not to upsample to anything beyond 44.1 kHz in the digital domain. Does it do any good? Does it have a positive impact on sound quality? As announced in the beginning we might not get to a point where we have a crystal clear Yes or No answer.

Upsampling & Reconstruction Filters

Hans Beekhuyzen looked into this topic on his YouTube Channel (which I very much recommend to watch/subscribe). The central idea of why upsampling makes sense to him is based on the observation that filtering processes in analog to digital and digital to analog conversion are error-prone because they are of non-trivial nature. It is a rather complex task to cut off frequencies that are meant to be left out mandatorily while others should go through untouched. While he still thinks the Nyquist-Theorem is correct he feels confident that it’s impossible to perfectly bandlimit a signal (which is a precondition for Nyquist).

There’s nothing much we can do in the analog domain. For sure we can exert influence on the reconstruction filter in the digital domain. Upsampling the signal with an integer factor (power of 2) opens up additional headroom for the filtering to be applied with less hassle and therefore reducing the appearance of distortion. To get the best possible upsampling result sufficient computational power is needed which goes beyond the capabilities of traditional DAC chips. Therefore Hans suggests to let the upsampling be done by dedicated streaming software (read my take on Roon here and here) or high end digital devices.

Another audio evangelist with an impressive industry track record is Paul McGowan. Paul is Founder and CEO of PS Audio and hosting the daily series “Ask Paul“. In one of his episodes he also digged into the topic of upsampling and if there is any kind of (positive) influence on sound quality.

Strictly speaking upsampling does not add any additional information compared to the initial data. If done correctly the original data is included a hundert percent in the upsampled data set. On top the granularity of the input data has been increased by interpolation. Why would this interpolation improve sound quality? The main challenge that needs to be tackled is once more the implementation of an ideal brick wall filter. The central idea is once more that upsampling is opening up a much wider transition band in which filtering can be applied:

I can make much gentler filters by taking the sample rate way out by upsampling. That fact alone which isn’t really giving me more information will make for better sounding digital audio because the filters are so much easier to design and so much gentler to the audio band.

PCM, DSD and PDM

Everything that we touched on so far is based on the concept of Pulse Code Modulation (PCM) and the variation of the sampling rate within PCM. In another episode Paul is adding an additional thought that is somehow compatible with upsampling. He suggests that an upsampling process followed by an additional conversion from pulse code modulation to Pulse Density Modulation (PDM) could also result in better sound quality. Once more the reason for the better sound cannot be found in any additional data that has been magically created in the upsampling and conversion process. Instead it is the improved overall “smoothness” and simplicity of the signal conversion. From Paul’s point of view there is nothing in the digital domain that comes closer to analog than DSD: “Converting PCM to DSD before we change it over to analog is a great practice because when we do that you are already almost at analog.”

Conclusion

So here we are 2,000+ words later. A lot has been written about the basic principles of music (re)production. Converting a continuous-time signal into digital and then back again. It seems like the fundamental findings of Nyquist, Shannon, Kotelnikow and Küpfmüller are still true nowadays. So CD-rez is delivering perfect sound quality for humans. There is no doubt about it. No discussion needed. No upsampling needed.

When taking a closer look at ADC and DAC processes it becomes obvious that the postulation of a bandlimited signal is associated with major challenges. Building and applying appropriate filters is crucial and far away from being trivial. At this point it might really be the case that upsampling can be a helpful tool to improve the quality of digital audio. As always it is almost impossible to have an isolated look at a single parameter that accounts for 100% of the perceived difference. This seems to be especially true in audio. In the end you have to trust your own ears. At least there are some very comprehensible approaches and explanations why upsampling could lead to better sound quality.

Photo by Realbigtaco on wikimedia |CC-BY-SA-3.0

More Stories
Sony Headphones
Sony 360 Reality Audio – What’s the Deal?