Forums » General Discussion Search

Timbre New Reply

Author Post
Posts: 3
Registered: Dec 18, 2009

Hi all,

Is there any documentation on what how the timbre coefficients are calculated or what they mean?

I would like to be able to get some measure of how perceptually similar two audio snippets are to each other. Would a simple correlation formula on the timbre vectors work?

Gene

Posts: 666
Registered: Sep 08, 2008

timbre is the quality of a musical note or sound that distinguishes different types of musical instruments, or voices. It is a complex notion also referred to as sound color, texture, or tone quality, and is derived from the shape of a segment’s spectro-temporal surface, independently of pitch and loudness. Our timbre feature is a vector that includes 12 unbounded values roughly centered around 0. Those values are high level abstractions of the spectral surface, ordered by degree of importance. First dimension represents the average loudness of the segment; second emphasizes its brightness; third is more closely correlated to the flatness of a sound; fourth to sounds with a stronger attack; etc. The actual timbre of the segment is best described as a linear combination of these 12 basis functions weighted by the coefficient values: timbre = c1 * b1 + c2 * b2 + ... + c12 * b12, where c1 to c12 represent the 12 coefficients and b1 to b12 the 12 basis functions as displayed below. Timbre vectors are best used in comparison with each other.

Paul

Posts: 27
Registered: Mar 29, 2009

Gene, I don't work for EchoNest, but I share your interest in the Timbre vector. If you feel up to it, you might consider taking a look at Tristan Jehan's PhD thesis (he is one of the EchoNest co-founders). There is a link to his thesis on the Publications portion of his MIT home page.

Paul, one thing has been bugging me about the Timbre vector. Your description seems contradictory. On the one hand it says that the Timbre is independent of loudness, yet it also says that the first dimension represents the average loudness. Can you shed some light on the apparent contradiction?

Thanks, Chris

Posts: 3
Registered: Dec 18, 2009

Thanks for the help. Paul, when you say "Timbre vectors are best used in comparison with each other" can you point me to some examples? I am interested in quantifying how perceptually similar-sounding two short snippets of audio are (say 2 sec long). Has anyone attempted this? I was thinking of taking the correlation of the two timbre vectors--since it's more than one segment, I was thinking either take the 2-d cross correlation or maybe just average out the correlation of the two timbre vectors over every point in time.

Best, gene

Posts: 2
Registered: May 25, 2010

Tripping across this discussion about six months late, but just want to acknowledge that I'm interested as well in learning more specifically about how timbre is analyzed by the software. I'll take a look at Tristan's papers, but I'm curious if there's any specific documentation available of the timbral vectors that Analyze uses, and examples of their practical use in comparison of segments.

Posts: 57
Registered: Sep 17, 2008

Josh,

Timbre has been updated since my publications. It is now a projection of the spectro-temporal auditory surface onto a lower dimensionality space of 12 dimensions. Think high-res spectrogram through psychoacoustic filters (inner-outer ear filter, dB compression, cochlea warping, frequency and temporal masking) and its resulting auditory surface converted to 12 "optimal" coefficients on a segment basis. See the thread here, and the basis functions. In a way, timbre coding here is somewhat similar to face coding in a face-recognition system. The similarities in timbre are characterized as distance similarities in those coefficients, which are decreasingly significant.

Answering Chris (sorry for missing that one), it is true that the first coefficient roughly represents the average loudness of the segment as defined by the coding process. It is there for full description / reconstruction of the auditory spectrogram, and because it is hard to fully decouple loudness from timbre in practice. You can take it into account, or not, according to the task.

Posts: 2
Registered: May 25, 2010

Ah! The description and the thumbnails on that thread made it click for me. Thanks so much for the pointer.

Posts: 3
Registered: Oct 28, 2010

Is there any correlation between timbre coefficients and the current instrumentation of a song? In particular, I'd like to determine when in a song the singer is singing, and when various instruments are played. For example, the fourth basis vector coefficient corresponds "to sounds with a stronger attack." Do you think this correlates with the presence of drums, in particular a snare? I know that this is currently an unsolved problem in academia, with even the best models getting about 90% accuracy in identifying particular instruments, but as far as I know they don't use a 12-dimensional representation of timbre. If anyone's thought about this, I'd love to hear your thoughts.

Posts: 57
Registered: Sep 17, 2008

I don't believe any system is fairly identifying instruments with 90% success, especially from a mixture of sounds. :) Our timbre coefficients describe the spectral shape of a sound in the time-frequency domain. Guessing from the data that there is a singer mixed with the other underlying sounds isn't trivial. But to answer your question, yes there's a correlation between the coefficients and the instrumentation, simply because these are computed from the underlying time-frequency signal of the instruments. However, the instrumentation should be regarded as a mixture, with all sorts of inherent variation, and expression, and not as individual separate timbres to be individually identified. For that, you'd need to go deeper, and have a lot more prior-knowledge than is given here. There's no such thing as source separation and instrument classification that is yet robust enough unfortunately. So we describe mixtures instead. It's very useful though. But it's more appropriate to compare them from each other and identify general structures or characteristics of those sounds, than it is to try and extract a single and usually ill-defined instrument or voice within the mixture when you typically don't have a good internal model of that timbre to start with. That said, you may get ok results with fairly simple test cases.

Posts: 9
Registered: Dec 03, 2009

Hi i have not been working with the API for half a year now, and it seems that the whole get_timbre part of the API is gone! Wherecan i still get the data? I mean not just a few coefficients from 'analyze', but the pages and pages of information that used to be the result of an inquiry. Thanks already for pointing it out!

Posts: 666
Registered: Sep 08, 2008

Hi mjbroekhuijsen - the get_timbre method is part of V3 of the API which can be found here:

http://developer.echonest.com/docs/v3/

We do recommend that you switch to V4 of the API. This returns the analysis data as a JSON file in the audio summary of a track. If you look at the example here:

http://developer.echonest.com/docs/v4/track.html#analyze

you will see that one of the returned elements is an audio_summary:

        "audio_summary": {
            "key": 7,
            "analysis_url": "https://echonest-analysis.s3.amazonaws.com:443/TR/TRXXHTJ1294CD8F3B3/3/full.json?Signature=VI8tmWY%2B%2F6eq85H2G7kmb6e4eWI%3D&Expires=1282240751&AWSAccessKeyId=AKIAIAFEHLM3KJ2XMHRA",
            "tempo": 168.46000000000001,
            "mode": 1,
            "time_signature": 4,
            "duration": 120.68526,
            "loudness": -19.140000000000001
            "danceability": .7
            "energy": .9
        },

The analysis_url contains all of the analysis data including the timbral information.

Hope this helps

Paul

Reply to this Thread

You must log in to post a reply.