Wednesday 29 June 2011

Digital audio, clearing the air

Go on the web and google "digital audio" and you will find hundreds of thousands of hits. There is an enormous amount of discussion going on about the topic. I would say that 90% of it is people coming from the analog era who simply don't understand digital audio. This is really a big surprise as the CD has been around since 1978 - seems we are slow learners.

But digital audio is very different from analog, and the conversions back and forward from analog to digital and back are not well understood. Lots of audiophile talk about "bit perfect", about jitter, about what is best USB, Firewire or optical SPDIF interconnects, without understanding the fundamentals. These are minor issues compared to the fundamental use of CD or HD audio.

So lets have a look at these and bust a few myths. I must comment that I and most audiophiles use Apple Mac computers as they have a well implemented set of audio core functions that provide very good performance, as we shall see.

Bit depth and volume controls

The Apple Mac "Core Audio" software has 32bit processing, any 16 bit CD tracks are converted to 32bit where the 16bit occupies the MSBs of the 32bit data. If a volume control is applied by shifting the data towards the LSB then the full 16bit info is retained for up to 16 shifts, or around 95db. One shift however represents a large drop of 6db in volume (more on this later).

What matters is the conversion, or not, back to 16bit with a simple DAC. If a 16bit DAC is used then the 32bits are converted back to 16 bits and bits 17-32 are thrown away. If the volume has been reduced digitally then the result is going to be a few more '0's at the MSB end, and the LSBs will extend below bit 16, so if a 16bit DAC is now used then these bits below bit 16 will be cut off. If a 32bit DAC is used then there is no loss of resolution down to the full range of the CD capability.

The Mac using OS X provides 32bit processing, and provided you hook up a 32bit, or more likely a 24bit DAC, any changes to volume or any effects applied, will be correct. The volume control in iTunes works this way for 16bit recordings (CD rips and the awful lossy AAC tat that Apple iTunes peddles).

On the other hand the iPod, iPhone and iPad (and Apple's Airplay) using iOS software do NOT work well, because iOS supports only 16bit audio processing. So their volume controls or Airplay transmissions always lose resolution when the volume is turned down. Which is why they sound so poor at low volumes. If you use a combination of an iPad, AAC compressed files, and the iPad/iOS digital volume control then you get very poor quality. Apple's Airplay is also just a 16bit pipe, so any 24bit tracks are down sampled. Airplay can carry the 16bit output from an iPad perfectly, but not the 24bits from an HD recording in iTunes. At least Airplay/Apple TV supports 48kHz so downsampling from 96kHz to 48kHz is relatively simple and accurate. But 24bit to 16bit clipping is not a good idea. Time for Apple to introduce 24bit/48kHz ALAC files as standard in iOS and on iTunes.

Mathematics and bit perfect

Volume is reduced not by simply shifting the data towards the LSB, which would reduce it in large 6db steps, but by mathematic computation of gain x signal input. To get a smooth volume control the gain number has ideally to be also a 24 or 32bit number, thus requiring a processor or maths unit capable of 24x24 or 32x32 calculations. The Mac OS X software actually uses floating point 32bit audio representation and fast enough processors so its calculations do this easily.

Remember here that all the arguments about bit perfect transmission are immediately destroyed by any gain calculation, the bits out are not the same as the bits in except when the volume control is set at one of the 6db steps represented by a one bit shift of the data. Multiply a number by a fraction and you more than likely will get a remainder. The only way to get bit perfect is not to use a digital volume control, but an analog one between DAC and the power amplifier leaving the digital part of the chain to always run at full resolution.

Limits and levels

Screen shot 2011 06 30 at 11 00 28 Remember also that normally the analog 0db level is equal to -12dB (14bit resolution) on a CD, leaving a 12db overhead up to the full 16bit capability (you cannot get more than all '1's in the 16bit word!). So most music is only 14bit dynamics (pop music limits or compresses music to use up this overhead, and even reach just 3dB below digital clipping to have "punch" so essential to the teenage ear). 12dB leaves an 84dB range for a standard CD linear PCM. For 24bit recordings the 0db level is set to -18db leaving 21bits for recordings, with an 18db overhead. Clearly 24bit is essential for HiFi systems. 21bits gives a 126db range, not enough for a full orchestra range of about 135db, but way better than 16bit/84dB of the CD.

The solution for audiophiles?

1 Use an Apple Mac and OS X which does all audio processing at 32bit floating point
2 Use a 24bit DAC, and Firewire or SPDIF optical connection
3 Adjust the gain of the analog power amplifier so that max DAC output (corresponding to full 24bit digital input level) is equal to the amplifier's max output peak output capability, so the DAC output can never cause it to clip.
4 Use true 24bit HD audio recordings, if you can get them!

On this last point it is worth saying that record companies seem distinctly unwilling to release music at 24bit/96kHz. I can see NO sensible reason why they would not do this, it would stimulate a new interest in music and audio reproduction, and mark a new turning point in the offering of higher quality music which has not happened since the CD was invented. The recent moves to AAC and streaming audio are very retrograde steps for the industry. In today's world with fast broadband speeds delivering tracks in 24/96 using lossless FLAC or ALAC compression is a no-brainer. A typical track is about 80-120MB, or in my calculations just 160secs to download across a 5Mbps ADSL line.

Use of available range

Screen shot 2011 06 29 at 09 34 21 Imagine a range of sound from the smallest (silence) to the loudest peaks. This has a certain range. Now where do we fit this range into the capability of a 16bit CD (which has a total range of 96dB) or a 24bit HD recording (with a total range of 144dB). If we put the range at the bottom LSB, then for the CD the highest volume peak can only be 96dB above this level, not enough for classical music. If we use 24bit then we can have up to 144dB, more than enough for any sounds in the world (except perhaps the space shuttle taking off).

We have been accustomed in the analog world to a concept of 0db as a reference level, with headroom above this for peaks. When it comes to digital recording we have to chose to put the peak of the range of sound at a level when all bits are full (all 1's), it can go no higher, this is the equivalent of analog clipping. But where then do we put 0db to give an acceptable headroom? The 16bit CD uses a 12dB overhead (only!), but 24bit HD audio uses 18dB (still a bit low but better than CD). Remember that 1bit of a digital signal = 6dB analog level, so a CD 0db corresponds to 14bit and HD audio 0db to 21bit. The 14bit of the CD is a rather poor resolution as it represents too few analog steps in the digitisation and leads to an inaccurate restoration of fast moving audio signals.

Mixing and effects

What we have also to take into account is that recordings are not made with a single microphone yielding 16 or 24bit of audio. Use more inputs or apply effects or EQ and you have to do some digital mathematics to mix the sounds. 24bit x 24bit mathematics can yield more than 24bit results! So the audio digital signal processing or mixing, equalisation, volume control or anything else needs more bits. Typically 32bits are used, and in Mac OS X floating point 32bit is the standard for the internal CAF audio format for Core Audio software. Hi end studio mixing equipment can run up to 32bit and 768kHz!

Sample rates, 44.1, 48, 88.2, 96, 176.4, 192...

The other matter that remains is what sample rate to use. Common rates are either multiples of 44.1kHz or 48kHz. (By the way there is no technical reason for choosing the 44.1kHz rate for a CD, other than the historical use of video tape recorders to capture the first digital recordings, would be great if the first engineers had chosen 48kHz instead...!). Our objective is to reproduce the sound of the original instrument, not the human perceived sound! The incredible fallacy of the argument that we only need a bandwidth that the human ear can hear (chosen as 20-20kHz) has provoked the acceptability of 44.1kHz sample rates of the CD giving a bandwidth up to around 21kHz (sharp low pass filters must be used before analog to digital conversion in the studio to avoid distortion), and even worse is the second argument that the human brain cannot distinguish between some sounds which has led to psycho-acoustic lossy MP3 and AAC compression disaster. With stupid claims that "AAC is CD quality" and "anyway you can't hear above 15khz or so".

The fact is some musical instruments generate sounds or harmonics over 90khz or so, the simple rattle of a bunch of keys shows a spectrum up to 50-60kHz. Nyquist mathematics says this will need a digital sample rate of 2x the highest frequency, or at least 180khz. So an obvious choice is 24bit/192kHz recordings. However 24bit/96kHz are more common, and are a fundamental improvement over 16bit/44.1kHz of the CD, giving a bandwidth up to 45kHz. They much better capture the dynamics and timbre of instruments and vocals. Thus giving a much better listening experience - more breathy openness, more dynamic oomph, etc. And lossless compression avoids the loss of phase information and stereo imaging that lossy formats like AAC suffer from.

Of course using a 24bit/96kHz digital system means you need a very good analog poer amplifier with bandwidth of 50khz or so and sustainable Low Frequency output, say down even to DC. Anyway the weakest link in the chain is the loudspeakers and this is where to spend your money.

Even CDs are no good

Clearly the 16bit of the CD, with a limited headroom of only 12dB and a silence-to-0db range of 14bit or only 84dB is not enough to record live music, classical or pop. So all recordings are compressed or their volume adjusted to fit in this 84dB range, or they are limited to avoid hitting the 16bit maximum level possible, pumping up the apparent volume. What is also clear is that digital volume controls in 16bit only systems are a disaster and that AAC or MP3 lossy compression is also a step backwards in audio quality...

HD audio at 24bit/96kHz in non-lossy compressed files such as FLAC or Apple's ALAC get a deal closer to what we need giving 18dB overhead and 144-18 = 126dB silence-to-0db range. This is enough for most classical and pop music.

You will be amazed at the improvement in audio listening pleasure if you move to HD Audio. Go try it out!

My setup

My setup: Apple MacBook, WiFi router, Appel iPad and iPhone, Apple Airport Express (analog or digital output), Apple TV (digital output only), home built amplifier with quad chip DAC, two Hypex Class D amplifiers, Spendor loudspeakers. All audio is stored on the MacBook in ALAC format files and streamed to the Airport Express or the Apple TV (Preferred). I have also a Firewire 24/96 DAC for direct connection to the MacBook for critical listening.

No comments: