Classical Music Forum banner
21 - 39 of 39 Posts

· Registered
Joined
·
208 Posts
Great post. It's not really that complicated, but few who understand it are willing to explain it in non-nerd language. The key thing is to transfer as much original material from the recording session to listeners' ears as possible. As Kuhlau says, with HD space so cheap and fast broadband connections becoming the norm in First World countries, there's increasingly little excuse to use the old lossy MP3 format.
 
G

·
Discussion Starter · #22 ·
As Kuhlau says, with HD space so cheap and fast broadband connections becoming the norm in First World countries, there's increasingly little excuse to use the old lossy MP3 format.
Hi 99, this is what I was trying to say in my non nerdy way:cool:

Kuhlau, thanks I am at last beginning to understand, it is something that puzzled me but being old and doddery I am easily confused:confused:
I suppose mp4 is not a great improvement, so all in all a case to be had for purchasing your music from your local dealer, at least for the present.
 
G

·
Discussion Starter · #25 ·
Ha Ha dont think that's the end to it, I still have to wade through all the info that you have given. e,g the binary system is yesterdays technology ?? :eek:
 

· Registered
Joined
·
293 Posts
All computing still reduces to binary at its most basic level, Andante.

Interestingly, there was a response to an article in the letters pages of International Record Review earlier this year which basically gave the same technical 'advice' as I gave here ... yet I only read that back issue today. Nice to have my understanding independently confirmed. :)

FK
 

· Registered
Joined
·
26 Posts
Confusing terms

I think part of the problem with this subject is the use of comfusing terms that have different meanings in different contexts -- the worst culprit being 'compression'

Compression is most often used to mean the control of dynamic range -- specifically the reduction of dynamic range, so that loud sounds aren't as loud as they were, while quieter sounds are louder. The idea being to let you hear the quite stuff above the background noise of the listening room while the loud bits don't annoy the neighbours.

However, in the context of digital audio codecs, 'compression' refers to the amount of data reduction being applied -- or how much the file size is reduced.

There are four basic ways to achieve a reduction in the data file size. The first is to use a lower sample rate. For example, instead of using the 44.1kHz rate of CD, the telephone system uses an 8kHz sample rate. The advantage is that you imediately have only about one fifth of the data to worry about... but the disadvantage is that the audio bandwidth is reduced to just 3.5kHz instead of 20kHz. Lousy for quality music, but adequate for intelligible speech... which is why it is used.

The second technique is to reduce the wordlength -- how many bits are used to describe the audio amplitude. Studio quality recordings are made with 24 bits per sample which gives a potential dynamic range in excess of 130dB. CDs are made with 16 bits per sample giving a potential dynamic range of over 90dB -- but that reduction in wordlength removes more than 30% of the data. Telephone systems use only 8 bits, reducing the data even more. But clearly, the disadvantage with this approach is a higher and more obvious noise floor, and a lower potential dynamic range.

The third option is to remove what is referred to as 'redundant data' and this is the scheme used by loss-less codecs like FLAC and MLP (amongst many others). If you consider the digital encoding of a photograph, each pixel must be described in terms of its colour and brightness, and in a high resolution picture that results in a huge amount of information. A loss-less codec reduces much of that information by avoiding repetition of identical data -- doing away with the 'redundant' information.

So if the sky is a uniform blue, say, rather than saying this is a bright blue pixel, this is a bright blue pixel, this is a bright blue pixel, this is a bright blue pixel... The codec notes it as this is a bright blue pixel, and so are the next 327, this is a less bright blue pixel and so are the next 28... and so on.

In this way, the amount of data required to describe the picture is reduced significantly, but the information and accuracy is preserved absolutely -- which is why it's called Loss-less. Audio signals tend to be very cyclical in nature much of the time, and this is largely predictable. The loss-less codecs simple take advantage of this 'predictability' to remove the redundant data and this reduce the file size. There are obvious limits to how far this can be taken, and that's why loss-loess audio codecs typically only reduce file sizes by about 60%.

The final option is where it all gets very contentious, and that's to remove 'irrelevant data' -- the elements of a complex sound recording that the human hearing system can not resolve and detect. Formats like Dolby Digital (AC3), DTS (apt-x), MiniDisc (ATRAC), DAB radio, MP3, MP4 and all the rest are lossy codecs and they all work (albeit in slightly different details) by trying to throw away parts of the sound that their designers' think we can't hear and won't notice.

The premise is that the human ear/brain is not a linear sound analysing system, but it actually a highly reactive and non-linear system whcih suffers from an effect called 'frequency masking'. Essentially, if I played you a quiet single-pitch tone at, say, 3kHz you would be able to perceive it quite happily. If I then add in a very much louder tone, starting at 100Hz, and gradually increase its frequency, you'll be able to hear and distinguish the two separate tones. But as the louder tone starts to approach about 2kHz the 3kHz tone will seem to disappear, and it won't come back until the loud tone has carried on up to something like 5kHz or more. It's quite a sobering demonstration, actually! But this 'frequency-masking' effect occurs throughout the hearing range and is the underlying principle of the old analogue tape noise-reduction systems like Dolby B and C.

In the case of the ever-popular MP3 format, the basic scheme is to divide the original audio signal into a large number of narrow frequency bands. The signal in each band is then analysed to determine which frequency components can and can't be heard in the presence of all the rest. Those that the systems thinks can't be heard are immediately discarded, never to be heard again! Obviously, this process immediately removes quite a lot of data from the signal.

The next stage is to analyse the signals elements that are left to determine how much louder they are than the system noise floor in each frequency band. The wordlength used to describe their amplitudes is then reduced to the minimum possible consistent with retaining a reasonable signal-noise ratio. This removes a whole load more data... and that's basically how the stereo signal from a CD player can be reduced from 1411kb/s (2x16x44100) down to a puny 128kb/s for MP3.

To trained ears, the quality loss is obvious, but the majority of the iPod listeners seem quite happy with the result, even though over 90% of the original information has been discarded.Clearly, using a higher data rate -- such as 320kb/s -- means that less data has to be thrown away and so the quality loss (and file size reduction) is not as great.

MP3 is a very clever system, but this is an area of technology that is advancing rapidly as the way the ear/brain works becomes better understood, and data processing power and speed improves. So MP3 is already old-hat, and there are better codecs around now (AAC being one of many) -- but the familiarity or MP3 is such that it will remain popular for a long time to come.

If you start with an MP3 file, a large proportion of data has been thrown away, as I've explained, and that can't be recovered in any way. However, if you convert the MP3 file to some other format (eg a wav file to burn onto an audio CD), the data has to be 'recompiled' into the appropriate format. Which means that instead of describing the data in terms of a series of narrow frequency ranges with variable wordlengths, the signal has to be rebuilt into conventional samples with a fixed wordlength... and in doing that considerably more data is required to describe the remaining audio.

But if you were to analyse that rebuilt signal carefully, you would see narrow frequency bands disappearing and reappearing as the signal changed, and the noise floor would be bouncing up and down in the different bands -- all as a result of the MP3 processing. So the quality remains at the MP3 level, even though the actual file size has increased.

Sorry for the long post -- but hopefully that's helped to explain what is going on in these systems and why they work in the way they do.

Hugh
 
G

·
Discussion Starter · #29 ·
Hugerr
Thank you for taking the time and effort to make such an informative post, I have a greater understanding now of the issues involved, it is to be hoped that this will be perused by our members.
Thanks again Kahlau for being so patient with me. :)
 

· Registered
Joined
·
160 Posts
hi Andante and anyone else interested in SACD (super audio Compact disc) .This CD music format is a high resolution music disc and can be bought in many good CD stores such as Marbecks in Auckland or online .
Because a REdbook (standard) CD player has a fixed 16 bit rate /44khz sampling rate it cannot play a SACD .SACD has a sampling rate of 2.8224 Mhz,64 times the rate of CD, but the bit rate is only 4 times higher .The bit rates should not be compared because SACD does not use Pulse Code Modulation (PCM) , it uses a 1 bit Delta Sigma process calles Direct Stream Digital(DSD) , yes ,another bloody acronym, dont you just hate them?.The advantages claimed for SACD are greater dynamic range and higher frequency response, better resolution generally .There are Dedicated SACD discs that play only in a SACD player but the most popular SACD is called the hybrid which has the high resolution layer of DSD on the top and a standard Redbook PCM on the underside of the disc .This means you can play the disc in a standard CD player at standard 16 /44 or play it in a SACD player which will play the top layer in high resolution .This is brilliant to compare the 2 resolutions , if you notice a huge upgrade you would buy SACD where you can , if you notice no difference , there is no motivation to buy other than standard CD 16/44.the diagram shows the 2 layers on a SACD
 

· Registered
Joined
·
122 Posts
Wait a minute... 2.8 megahertz? I'm afraid I can't imagine why that should be necessary. The Nyquist frequency in that case is 1.4MHz (The Nyquist frequency is the highest frequency audio signal that can be reproduced without 'aliasing' and equals half of the sampling rate). Given that the human ear can only detect sounds up to 20 or 30 kHz (the range is usually quoted as 20Hz to 20kHz), it seems to me that 1.4MHz is excessive. (This Nyquist argument is the basic reason why the highest sample rate normally used is 44.1k, giving a highest reproducible frequency (Nyquist frequency) of 22.05kHz)
But perhaps I misunderstand; you say that SACD does not use PCM, but instead "DSD" (I've not heard of it)... does it involve partial samples or something? I certainly can't imagine why one would want 2.8 million samples per second... :confused:
 

· Registered
Joined
·
160 Posts
hi soundand fury . If you look at the diagrams you will see the huge difference in principle between PCM and DSD .the bit width varies with the amplitude , exactly likeFrequency modulation ( fm).Maybe nyquist frequency does not apply to such format .Maybe the sampling frequency is requred because there is so much data. I do know that there is up to 7 Gigabytes of data on a SACD against 700 Megabytes on a CD .all that data contains up to 6(usually 5 ) channels of high resolution multichannel audio PLUS 2 channels of high resolution stereo .You can instruct the player whether you want multichannel or stereo .EACH of the 5 plus 2 channels has a performance EQUIVALENT in PCM terms, for comparison purposes only, of 20 bit /92k. so you can see why all the data .I hope that is some help
 

· Registered
Joined
·
26 Posts
Wait a minute... 2.8 megahertz? I'm afraid I can't imagine why that should be necessary.
In fact, it isn't high enough! Several of the professional DSD recording systems are now working with a rate of 5.6448MHz (twice as high), and some are arguing for higher rates still.

The issue here is nothing to do with the Nyquist rate as such, and everything to do with the concepts of oversampling and decimation.

CD operates at 44.1kHz and 16 bit wordlengths. However, if you want to improve the dynamic range and require 24 bits, there is a problem. Conventional quantisers aren't sufficiently accurate to allow that. We simply can't make devices with sufficient precision in terms of signal voltages. However, all is not lost. What we can do is trade off precision in terms of signal voltage against precisionnin time -- our technology is very good at providing precise timing resolution.

If you double the sample rate, you can reduce the wordlength without loss of signal resolution. This interesting concept was discovered by Philips back inthe late 1980s and used in their oversampling and Bitstream CD players. By oversampling at a rate of four times but with 14 bit converters, they could achieve the same audio resoulation as a conventional system running at 44.1 and 16 bits (better, in fact, because 16 bit converters back tyhen weren't terribly accurate!).

Subsequently, this idea was developed, and ever higher sample rates were used with lower wordlengths... and the same concept used in reverse for A-D converters too. Sample at stupidly high rates but with very low wordlengths, and then use a decimation process to trade off the high sample rate for greater wordlengths to produce the wanted 44.1/24 bit signal (or whatever).

This is the idea behind the Delta-Sigma converters that are ubiquitous in every modern digital audio device. Typically, these things operate at 5.6MHz but with only three or four bits.

The Sony SACD (and the underlying Direct Stream Digital (DSD) technology) approach is based exactly the same concept, but instead of bothering to transcode between the 5.6MHz/3 bit delta-sigma format and the 44.1kHz/24 bit format, they leave the information at the high sample rate. Thus avoiding all the transcoding stages (decimation and oversampling). This is argued to avoid the potential quality loss through the transcoding processes, but primarily reduces costs.

So, the DSD system produces a digital signal which runs with a sample rate of 2.8MHz and only one bit, but which has broadly the same information capacity as a 96kHz, 24bit signal.

It's not all quite as rosy as Sony would make out, though, and not least because signal processing a one-bit signal is very complicated. The noise floor is also a problem -- it rises enormously above 20kHz, and although the Sony marketing claims a bandwidth of 100kHz, the signal-noise ratio falls to very low levels. we can't hear it, of course, but it can cause stability and intermodulation problems in some cases.

Hugh
 

· Registered
Joined
·
122 Posts
thanks bongos and hugerr, I think it makes sense now (mostly). So DSD is a sort of PWM then? But with lengths that are discrete, and the 'quantum' is 1/2.8MHz? Also by that diagram it seems that each pulse starts as soon as the last one finishes, so the sample rate is faster for low amplitudes, is that right?
It certainly looks very interesting but I can't pretend to fully understand it. For one thing I can't imagine how you would do any signal processing without transcoding it to PCM and then back to DSD.
Also presumably the lower the peak amplitude, the higher the min. sample rate so the higher the (effective) nyquist frequency - so for louder signals the nyquist f is not as high? (Although with a max sample rate of 2.8MHz I suppose that even then it will still be high?)
Also, am I right in thinking that basically DSD is equivalent to a sort of 'unary PCM'?

Sorry for waffling so much but I still don't quite see the point of DSD - I find it hard to believe that the ear can hear anything beyond 44.1/16 anyway. Of course the audio treatment of the room the equipment is in matters far more. Perhaps all the hype is just a way of keeping the hardware companies in business but then I think that about HDTV as well. I dare say I'd have said the same about colour TV and CDs if I'd been around when they were invented.
 

· Registered
Joined
·
26 Posts
So DSD is a sort of PWM then?
No, it is straight PCM, but because it only uses one bit it can be drawn to make it look like it works in a similar way to PWM.

It is actually a form of delta modulation -- there is no absolute amplitude value, just a record of whether the current sample is larger or smaller than the preceeding one. The hope is that by making that evaluation 2.8 million times a second, the analogue audio waveform can be traced accurately. Many have argued this is not actually the case, and that's why some dsd recorders now offer a 5.6MHz sample rate.

so the sample rate is faster for low amplitudes, is that right?
No. It is a fixed 2.8224MHz sample rate. With a low frequency signal, the larger or smaller decision tends to dither up and down a lot more, simple because the rate of change is slow. With high frequency signals, the larger/smaller decision tends to run inthe same direction for much longer periods of time, simply because the rate of change of audio signal is faster. When drawn the way it has been, it can be confusing.

I can't imagine how you would do any signal processing without transcoding it to PCM and then back to DSD.
Most systems do transcode from single bit DSD to multibit PCM and back again.

Also presumably the lower the peak amplitude, the higher the min. sample rate so the higher the (effective) nyquist frequency - so for louder signals the nyquist f is not as high?
No, the sample rate doesn't change, and there is no change of nysquist frquency versus amplitude. I think the poor diagram is confusing you...

Also, am I right in thinking that basically DSD is equivalent to a sort of 'unary PCM'?
yes.

Sorry for waffling so much but I still don't quite see the point of DSD
It's cheap. That's always been the point of digital.

DSD bypasses the need for decimation and oversampling stages. It makes A-D and D-A chips simpler and cheaper. And it gives Sony something to market as a fantastic new music format to help replace the revenues it lost when the patent copyright on the CD format expired.

I find it hard to believe that the ear can hear anything beyond 44.1/16 anyway
The dynamic range of the ear is much greater than the theoretical 96dB range of a 16 bit system. 24 bits make far more sense in that regard. The theoretical 22kHz bandwidth of a 44.1kHz sampled system should be adequate -- and in high end converters it is -- but in the past poor design sometimes meant that low cost 44.1 converters didn't /don't perform to the theoretical capabilities. In those cases switching to convert at 96kHz often sounded better, and with the greater capabilities and storage of modern equipment there are some very sound technical arguments for recording source material at 24/96.

However, the CD format remains perfectly viable for domestic applications, and it's quality capabilities exceeds the requirements of the majority of listeners.

hugh
 

· Registered
Joined
·
122 Posts
OK thanks, now I understand (I do get the idea of delta modulation, in fact that seems eminently sensible to me)
As for this business of dynamic range, presumably the ear can't hear a sound at say 20dB and another at say 90dB simultaneously... so maybe some sort of 'drifting point' representation, where a couple of bit patterns are reserved for 'move point left' and 'move point right' would mean that a) you could have lower bitrates for a given quality and b) you wouldn't need such wide ADCs/DACs...? Wouldn't that make for a better system? (I'm going to code a quick PoC just to see what kind of effect that has, whether it's practicable etc)

[edit]Having bashed together a quick test program in BASIC, it appears that at, say, an 8-bit wordlength with two values reserved for 'shift left' and 'shift right', a 'drifting point' reduces the noise floor by about 10dB (from -60dB to -70dB). The test signal I was using was the first 24s of the 3rd mvt of Hummel's trumpet concerto.[/edit]
 
21 - 39 of 39 Posts
This is an older thread, you may not receive a response, and could be reviving an old thread. Please consider creating a new thread.
Top