Confusing terms
I think part of the problem with this subject is the use of comfusing terms that have different meanings in different contexts -- the worst culprit being 'compression'
Compression is most often used to mean the control of dynamic range -- specifically the reduction of dynamic range, so that loud sounds aren't as loud as they were, while quieter sounds are louder. The idea being to let you hear the quite stuff above the background noise of the listening room while the loud bits don't annoy the neighbours.
However, in the context of digital audio codecs, 'compression' refers to the amount of data reduction being applied -- or how much the file size is reduced.
There are four basic ways to achieve a reduction in the data file size. The first is to use a lower sample rate. For example, instead of using the 44.1kHz rate of CD, the telephone system uses an 8kHz sample rate. The advantage is that you imediately have only about one fifth of the data to worry about... but the disadvantage is that the audio bandwidth is reduced to just 3.5kHz instead of 20kHz. Lousy for quality music, but adequate for intelligible speech... which is why it is used.
The second technique is to reduce the wordlength -- how many bits are used to describe the audio amplitude. Studio quality recordings are made with 24 bits per sample which gives a potential dynamic range in excess of 130dB. CDs are made with 16 bits per sample giving a potential dynamic range of over 90dB -- but that reduction in wordlength removes more than 30% of the data. Telephone systems use only 8 bits, reducing the data even more. But clearly, the disadvantage with this approach is a higher and more obvious noise floor, and a lower potential dynamic range.
The third option is to remove what is referred to as 'redundant data' and this is the scheme used by loss-less codecs like FLAC and MLP (amongst many others). If you consider the digital encoding of a photograph, each pixel must be described in terms of its colour and brightness, and in a high resolution picture that results in a huge amount of information. A loss-less codec reduces much of that information by avoiding repetition of identical data -- doing away with the 'redundant' information.
So if the sky is a uniform blue, say, rather than saying this is a bright blue pixel, this is a bright blue pixel, this is a bright blue pixel, this is a bright blue pixel... The codec notes it as this is a bright blue pixel, and so are the next 327, this is a less bright blue pixel and so are the next 28... and so on.
In this way, the amount of data required to describe the picture is reduced significantly, but the information and accuracy is preserved absolutely -- which is why it's called Loss-less. Audio signals tend to be very cyclical in nature much of the time, and this is largely predictable. The loss-less codecs simple take advantage of this 'predictability' to remove the redundant data and this reduce the file size. There are obvious limits to how far this can be taken, and that's why loss-loess audio codecs typically only reduce file sizes by about 60%.
The final option is where it all gets very contentious, and that's to remove 'irrelevant data' -- the elements of a complex sound recording that the human hearing system can not resolve and detect. Formats like Dolby Digital (AC3), DTS (apt-x), MiniDisc (ATRAC), DAB radio, MP3, MP4 and all the rest are lossy codecs and they all work (albeit in slightly different details) by trying to throw away parts of the sound that their designers' think we can't hear and won't notice.
The premise is that the human ear/brain is not a linear sound analysing system, but it actually a highly reactive and non-linear system whcih suffers from an effect called 'frequency masking'. Essentially, if I played you a quiet single-pitch tone at, say, 3kHz you would be able to perceive it quite happily. If I then add in a very much louder tone, starting at 100Hz, and gradually increase its frequency, you'll be able to hear and distinguish the two separate tones. But as the louder tone starts to approach about 2kHz the 3kHz tone will seem to disappear, and it won't come back until the loud tone has carried on up to something like 5kHz or more. It's quite a sobering demonstration, actually! But this 'frequency-masking' effect occurs throughout the hearing range and is the underlying principle of the old analogue tape noise-reduction systems like Dolby B and C.
In the case of the ever-popular MP3 format, the basic scheme is to divide the original audio signal into a large number of narrow frequency bands. The signal in each band is then analysed to determine which frequency components can and can't be heard in the presence of all the rest. Those that the systems thinks can't be heard are immediately discarded, never to be heard again! Obviously, this process immediately removes quite a lot of data from the signal.
The next stage is to analyse the signals elements that are left to determine how much louder they are than the system noise floor in each frequency band. The wordlength used to describe their amplitudes is then reduced to the minimum possible consistent with retaining a reasonable signal-noise ratio. This removes a whole load more data... and that's basically how the stereo signal from a CD player can be reduced from 1411kb/s (2x16x44100) down to a puny 128kb/s for MP3.
To trained ears, the quality loss is obvious, but the majority of the iPod listeners seem quite happy with the result, even though over 90% of the original information has been discarded.Clearly, using a higher data rate -- such as 320kb/s -- means that less data has to be thrown away and so the quality loss (and file size reduction) is not as great.
MP3 is a very clever system, but this is an area of technology that is advancing rapidly as the way the ear/brain works becomes better understood, and data processing power and speed improves. So MP3 is already old-hat, and there are better codecs around now (AAC being one of many) -- but the familiarity or MP3 is such that it will remain popular for a long time to come.
If you start with an MP3 file, a large proportion of data has been thrown away, as I've explained, and that can't be recovered in any way. However, if you convert the MP3 file to some other format (eg a wav file to burn onto an audio CD), the data has to be 'recompiled' into the appropriate format. Which means that instead of describing the data in terms of a series of narrow frequency ranges with variable wordlengths, the signal has to be rebuilt into conventional samples with a fixed wordlength... and in doing that considerably more data is required to describe the remaining audio.
But if you were to analyse that rebuilt signal carefully, you would see narrow frequency bands disappearing and reappearing as the signal changed, and the noise floor would be bouncing up and down in the different bands -- all as a result of the MP3 processing. So the quality remains at the MP3 level, even though the actual file size has increased.
Sorry for the long post -- but hopefully that's helped to explain what is going on in these systems and why they work in the way they do.
Hugh