Notes by ff123
Listening tests by many
NOTICE: The listening tests performed with Lame 3.88 are for an alpha version, and are not necessarily representative of the Lame 3.88 beta version.
Addendum, Mar 30, 2001: Added results of EarGuy's Digital Ear
I prepared a short test for high-frequency "ringing" in several mp3 encoders, using a male vocal sample that a Czech member of the Lame mail list made a year or so ago. This shortened version of the sample was forwarded to me by Hans Heijden. I had asked Hans if he knew of another example of the type of artifact heard by bAdDuDeX, which was particularly objectionable to him in music encoded by Lame (see my Ringing in Lame page). The term "ringing" was used by bAdDuDeX to describe a certain type of artifact which DualIP ascribes to on-off switching of higher subbands (see text below).
Here were my instructions for the listening test, which I posted to alt.binaries.sounds.mp3.d and to the r3mix.net forum:
There are four mp3 files encoded at 128 kbit/s. The default low-pass filters were enabled. The encoders are:
1. FastEnc (Cool Edit Pro with MP3 ME plugin, CRC disabled)
2. MP3Enc31 (-qual 9)
3. Lame 3.87RH (-h)
4. Lame 3.88CVS 010104 (-h --nspsytune)
The encoded mp3 files can be found on my Audio Samples Page as eb_andul.zip. The original file is eb_andul_short.flac, and has been compressed using the lossless compressor, FLAC. Please rank the samples and tell me what you think sounds best, in order.
Here is a quick summary ranking by listener:
| Listener | Ranking (best to worst) |
| HansHeijden | 45, 33, 24, 16 |
| bAdDuDeX | 45, 33 = 24, 16 |
| r3mix | 45, 24, 16, 33 |
| DualIP | 45=33=16, 24 |
| Naoki Shibata | 33 = 45, 24, 16 |
| JohnV | 33, 45, 16, 24 |
| 2Bdecided | 33, 45 = 24 = 16 |
| ff123 | 33, 16, 24, 45 |
| TLO | 24, 16, 33, 45 |
| JuliusBT | 16, 33, 24, 45 |
| Speek | no difference |
| Digital Ear | left ch: 33, 16, 24, 45 right ch: 16, 33 24, 45 |
The detailed responses follow:
| r3mix: without having listened to the original, so judging on what sounds best to me: short16: I don't like this one. Has a sharp tone at
the 1.6 second mark. now I listen to the original: ok, I must stop now because I'm starting to hear
things that aren't there... 45,24,16,33 is the order I'd pick if I needed to listen to the whole album. (taking it's a like those few seconds) |
| 2Bdecided: 16 - picked out from
original in blind test - sibilance has slightly wider
stereo spread 16 has that classic mp3 128 sound (watery). I like 24 because it least it's different - doesn't automatically make me think "mp3" - more "Lucy in the Sky with diamonds" (slightly phasey). 33 has a bit of that classic mp3 128 sound, but I think it's probably my favourite. 45 would be on a par with 16 or 33 if it weren't for that little blip, which spoils it. I don't really like any of them - can't I say the original is my favourite? OK - I'll take 33. You can't listen too closely to a short sample, because after a while you've heard it so often that you hear things (like Roel said) - however, a whole album of 16 would be obviously mp3, even if no one told you, and you didn't have the original. btw, the original sounds like it's been overly processed with noise reduction - can't tell if it's Sonic Solution, Cedar, or a cassette played on Dolby C though ;-) my ranking would be: 33, 45, 24, 16 BUT it's so close that it's really 33 1st, then the others all joint second. I put 45 second assuming it wouldn't make any further loud blips. However, in reality I guess it would, and, over a whole album I think this would be so irritating as to put it last! 24 would get irritating once you got used to it I think - and 16 is irritating because I _have_ got used to that sound. So 33 is best, and the others are equally bad - sorry! |
| Naoki Shibata: 16 and 24 : "sh"
voices are slightly watery. My rankng is : But the differences are very small. [in response to bAdDuDeX's comment on background hiss]: Please check if specifying "--athtype 1" improves the background hiss. BTW, if you test CBR mode of latest CVS version of lame with --nspsytune, please use -q1 instead of -h. |
| JuliusBT: my guess is that 24 is the MP3Enc version since it has that sound. I could [...] distinguish the original wav from all of them, and I thought that 45 sounded least OK to me. I would still have rated them: 16/33/24 best My judgement is beclouded now, I'd rank 16 as best, [...] 33 [...] is second. Then 24 and at the bottom 45 (not even counting the glitch!) I could not tell much of a difference between the 4 samples, apart from the 45 one that I thought sounded less perfect than the others. I could not hear much real high frequency content at all (above 16 kc). |
| Speek: I can't tell the difference. For me they are all the same. I don't hear any ringing or other artifacts. |
| HansHeijden (from email messages to
me): Ranking turns out to be simply: the higher the
number in the filename, the better. This based on the
amount of ringing heard. However concerning the first
's', 24 and 33 are less distorted. So either 33 or 45 I
would prefer... [33 is] almost as 'ringing-free' as [45], and superior with the 's' distorsion. |
| DualIP (from a post on alt.sounds.binaries.mp3.d): ff123
wrote: I do hear it on the 33.mp3 . , and understand the term clearly. it's caused by the on/off switching of higher subbands. Encode some noise stuff to say 48kb/s at 22k sample freq , an this subband switching will be clear to anyone ringing: overall qual: ***************** After : First of all the .WAV: -The two slishing ssssh sound at 0:04 and 0:06 are very artificial. It sounds to me like these open up the noise gate for full-bandwidth and , even worse , starts some echo/reverb , that only works in this high frequency region. Very unnatural since normal echo/reverb works as a low pass filter, and here you only hear the high-freq. part of the sssh decaying Previous I didn't realize these "flaws" in the original .wav. and this obscured previous judgement. The ringing is in the echo/reverb of this sssh sound. Seems like worst encodes make some staircase decay in stead of exponential decay out of it. (subbands on/off switching) Because 45 sounds dull , it gets rid of some of the HF flange-alike "ssssh echo", improving sound quality at first sight. New judgement: 33 , 45 , 16 , 99 come close , have different flaws making comparison hard. In any order 16: Wider bandwidth at the expense of most obvious
ringing: |
| TLO (from a post on alt.sounds.binaries.mp3.d): A deceptively simple clip that presents clear challenges once scrutinized. Clicks suggest a vinyl source (or CD mastered from LP, do you know if that's true?). Wide separation of acoustic piano in one channel, acoustic guitar in the other, and reverb on the sibilant vocal in both promise a good test of 128. Listening carefully I hear details I'd guess are going to cause some problems: At some points, the guitarist's hand on the wound strings creates a clear scraping sound in the right channel, the first at appx .7 secs (string muting almost like a breath intake) and a pair of subtler ones around 4.7 and 5.3 for example. The vocal effect isn't going to be encoder friendly either -- sounds like a good analog plate reverb. The piano is dry and compressed so it shouldn't be an issue (as it can often be at stereo and fuller frequency). Traditional "70s big studio" Cat Stevens production aesthetic here. Ranked and superlatively speaking: 01. Eb_andul_short16.mp3 - Overbright, some artifacts.
Second Given similarities in the first 2 and maybe 3 I'm going to go out on a limb and ID the encoders. These are: Eb_andul_short16.mp3 = LAME All have artifacts better described as a watery chirping rather than ringing. They may be softer on 33 but no more so than an EQ would obscure. The dullness and blatant artifacts in 45 leave little doubt that is MP3Enc: I can hear that garbage from a mile away. 16 and 24 are quite close. 33 I might suspect as nspsytune as it's the odd man out in terms of a slight upper mid smearing or blurriness. Out of these, whatever made 24 is what I'd hope the track would be encoded with, but 16 is alright if I *had* to take a 128. *** I hear upwards of 18 kHz. I hear the higher-pitched watery chirping being called "ringing". I hear artifacts. And in simple point of fact I hear the most artifacts in [45]. Flat out, more artifacts. |
| bAdDuDeX (from a post on alt.sounds.binaries.mp3.d): 16
- Vocals sound too bright, the hiss in the background is
totally masked, and there's ringing everywhere. In case you don't know, the hiss in the background is a GOOD thing. It shows that the encoder assumes it to be audible and thus doesn't mask it out. ff123 wrote: Yes, it's in the original. Must have been introduced in the recording process. "Loss of hiss" isn't an artifact in itself, heh. That type of thing pertains to any quiet noises in the background on music, not just hiss. It shows that LAME masks more than the FhG encoders do. If I were to go just by artifact rate then I would have put 33 over 24 (instead of a tie). But you have to factor extra masking into the picture too. Easily first place: 45 ff123 wrote: Just did a quick test and now it has even more hiss than the MP3Enc version. It still has more artifacts though. 99 still doesn't have as much hiss as the WAV but it's closer than 45 now. [...] when using --nspsytune LAME doesn't really have any ringing on this sample. The main thing is that the vocals don't sound as clean as 45. They sound....compressed or something. 45 has much more natural sounding vocals. 24 sounded even more compressed with the vocals (which is a big problem with FastEnc, I hear it all the time). 16 just sounded horrible all-around. That's definitly a file I would delete. [On the term "compressed"]: [responding to TLO's comment on what "ringing"
sounds like]: |
| JohnV: Ok, Iīll give my view,
although I already know which sample is which, so I
consider myself already biased. In my opinion 24 is the worst. This is the sample that in my opinion has the most quite audible artifacts, especially with īsī vocals. But also the quality in whole is the lowest in my opinion. 16 is the second worst. I donīt like the īsī vocals and I hear some high frequency vibrations especially with īsī echoes. 45 it sounds somewhat dull and the blip in the beginning before that "spischnich"-word ruins it. It sounds somewhat "thicker" or a duller than the original or other samples. 33 In my opinion this is overall the most solid quality sample among these, surely not anywhere near transparent though. For example the īsī just before 4 second position is not very sharp. Not good but uniform quality is what makes this the best, in my opinion. Huh, I canīt believe somebody actually thinks 24 sounds best. Itīs quite badly audibly distorted all the way. I would bet that if the sample was in english language, 24 and 16 would be clearly identified the worst samples. I took me some time (5-6 times each) before I could really hear majority of distortions. The pronunciation is quite odd, and that at least in my case was a bit problematic. |
| ff123: 16: I noticed a spreading
of the sibilance in the first "s" plus a very
small noise around the same time (ABX = 16 of 16) Ratings from best to worst: 33, 16, 24, 45 |
| I had the d-ear listen to the Eb_andul clips for both
the left and right channel. I placed the average
distortion on a normalized line from left to right with
the leftmost being the best and the rightmost being the
worst (of the test set). Here are the results:
|
Which encoder is which? Answers are here.