Monday 22 April 2013

Dither and Noise Shaping

People often ask what dithering does.  Most seem to know that it involves adding random noise to digital music for the apparently contradictory purpose of making it sound better, but don’t know how it accomplishes that.  On the other hand, very few people understand what Noise Shaping is and what it actually does – or that it is in reality a form of dithering.  Since neither concept is particularly difficult to grasp, I thought you might appreciate a short post on the subject.  I warn you, though, there are going to be some NUMBERS involved, so you might want to grab a pencil and a piece of paper to hand.

Suppose we encode a music signal as a PCM data stream (read my earlier post “So you think you understand Digital Audio?” if you are unsure how that works).  Each PCM “sample” represents the magnitude of the musical waveform at the instant it is measured, and its value is stored as that of the nearest “quantized level”.  These “quantized levels” are the limited set of values that the stored data can take, and so there will be some sort of error associated with quantization.  This “quantization error” is the difference between the actual signal value and the stored “quantized” value.  For example, suppose the value of the signal is 0.5004 and the two closest quantization levels are 0.500 and 0.501.  If we store the PCM sample as 0.500 then the associated quantization error is +0.0004.  If we had chosen to store the PCM sample as 0.501 then the associated quantization error would have been -0.0006.  Since the former is less of an error than the latter, it seems obvious that the former is the more accurate representation of the original waveform.  Are you with me so far?

One way to look at the PCM encoding process is to think of it as storing the exact music signal, plus an added error signal, comprising all of the quantization errors.  The quantization errors are the Noise or Distortion introduced by the quantization process.  The distinction between noise and distortion are critically important here.  The difference between the two is that distortion is related to the underlying signal (the term we use is “Correlated”), whereas noise is not.

I am going to go out of my way here to give a specific numerical example because it is quite important to grasp the notion of Correlation.  I am going to give a whole load of numbers, and it would be best if you had a pencil and paper handy to sketch them out in rough graphical form.  Suppose we have a musical waveform which is a sawtooth pattern, repeating the sequence:
0.3000, 0.4002, 0.5004, 0.6006, 0.7008, 0.8010, 0.7008, 0.6006, 0.5004, 0.4002, 0.3000 …
Now, lets suppose that our quantization levels are equally spaced every 0.001 apart.  Therefore the signal will be quantized to the following repeating sequence:
0.300, 0.400, 0.500, 0.601, 0.701, 0.801, 0.701, 0.601, 0.500, 0.400, 0.300 …
The resultant quantization errors will therefore comprise this repeating sequence:
0.0000, +0.0002, +0.0004, -0.0004, -0.0002, 0.0000, -0.0002, -0.0004, +0.0004, +0.0002, 0.0000 …
If you plot these repeating sequences on a graph, you will see that the sequence of Quantization Errors forms a pattern that is intriguingly similar to the original signal, but is not quite the same.  This is an example of a highly correlated quantization error signal.  What we want ideally, is for the quantization errors to resemble, as closely as possible, a sequence of random numbers.  Totally random numbers represent pure noise, whereas highly correlated numbers represent pure distortion.

In reality, any set of real-world quantization error numbers can be broken down into a sum of two components – one component which is reasonably well correlated, and another which is pretty much random.  Psychoacoustically speaking, the ear is far more sensitive to distortion than to noise.  In other words, if we can replace a small amount of distortion with a larger amount of noise, then the result may be perceived as sounding better.

Time to go back to our piece of hypothetical data.  Suppose I take a random selection of samples, and modify them so that we choose not the closest quantization level, but the second-closest.  Here is one example – the signal is now quantized to the following repeating sequence:
0.300, 0.400, 0.501, 0.600, 0.701, 0.801, 0.700, 0.601, 0.500, 0.401, 0.300 …
The resultant quantization errors now comprise this repeating sequence:
0.0000, +0.0002, -0.0006, +0.0006, -0.0002, 0.0000, +0.0008, -0.0004, +0.0004, -0.0008, 0.0000...

There are three things we can take away from this revised quantization error sequence.  The first is that it no longer looks as though it is related to the original data, so it is no longer correlated, and looks a lot more like noise.  The second is that the overall signal level has gone up, so we have replaced a certain amount of correlated signal with a slightly larger amount of noise signal.  Third, and this is where we finally get around to the second element of this post, the noise seems to have quite a lot of high-frequency energy associated with it.

So here we have the concepts of Dither and Noise Shaping in a nutshell.  By carefully re-quantizing certain selected samples of the music data stream in a pseudo-random way, we can replace distortion with noise.  Likewise, using what amounts to the same technique, we can do something very similar and replace an amount of noise in the portion of the frequency band to which we are most sensitive, with a larger amount of noise in a different frequency band to which we are less sensitive, or which we know can be easily filtered out at some later stage.

One thing needs to be borne in mind, though.  Dithering and Noise Shaping operate only on the noise which is being added to the signal as a result of a quantization process, and not on the noise which is already present in the signal.  After the Dithering and Noise Shaping, all of this new noise is now incorporated into the music signal, and is no longer separable.  So you have to be really careful about when you introduce Dither or Noise Shaping into the signal, and how often you do it, because its effects are cumulative.  If you do it too many times, it is easy to end up with an unacceptable amount of high frequency noise.

I hope you were able to follow that, and I apologize again for the ugly numbers :)