Forums » Feature Requests Search

How to compare timbres and loudness of two tracks? New Reply

Author Post
Posts: 1
Registered: Aug 13, 2010

How do I compare the timbres of two tracks? Is there a similarity function for this? If not, how can I compare the timbre of two tracks?

In the Analyze documentation it explains the first four elements of the vector, but what about the last eight?

"Our timbre feature is a vector that includes 12 unbounded values roughly centered around 0. Those values are high level abstractions of the spectral surface, ordered by degree of importance. For completeness however, the first dimension represents the average loudness of the segment; second emphasizes brightness; third is more closely correlated to the flatness of a sound; fourth to sounds with a stronger attack; etc."

In the Track API Methods for analyzing a track the information about timbre doesn't appear. How can I retrieve this information if there exist no function for comparing timbres?

I also have trouble understanding the explaination of loudness in the Analyze documentation. What is "loudness_start", "loudness_max_time" and "loudness_max"? In the Track API Methods for analyzing it is only a single number. Is this a median for the loudness of the track?

I also need to compare the loudness of two tracks. Is there a function for this? If not, in what range would two tracks be similar, e.g. +-4dB?

Posts: 57
Registered: Sep 17, 2008

Michael,

Timbre is a complex ill-defined notion, in particular at the track level. We provide 12-dimensional vectors that describe the timbre of single sounds (e.g. a snare drum, a piano chord, etc). Comparing timbres of tracks isn't trivial and we don't yet provide a single vector for that purpose. You'll find dozens of academic papers covering that topic. The simplest but limited approach is to take mean and standard deviation of the provided vectors. Beyond that, you have to get into more advanced statistical techniques such as GMM or HMM.

Regarding the "meaning" of the timbre dimensions: it's up to interpretation. The basis functions were mathematically derived from analyzing hundreds of thousands of sounds. There's no actual description of each dimension into the perceptual space per say. The few examples given are simple observations of the shape of those basis functions.

Loudness on the other hand is given for the entire track (like tempo, or key). It is a single number in dB. You can access more detailed values of loudness for every segment of the track. You get a loudness value at the onset, and at the summit of the attack. The computation of overall loudness is more involved than calculating the median though.

Finally, it is up to you to compare loudness values and define what is similar. There's no real answer to that question. It depends the use case and the size of your catalog. You could compute the loudness distribution to define what works for your application.

Hope this helps.

Posts: 5
Registered: Sep 23, 2010

Nonetheless, it would still be nice to have a scientific explanation of what the "basis functions" are and how they interact with each other. Also an explanation of the twelve timbre graphical reprentations in the said Analyze documentation would be nice to have as well. How can we scientifically use the timbre function, if we do not understand what is the precise significance of Echo Nest's timbre value measurement ?

Posts: 666
Registered: Sep 08, 2008

MaltaCross2 - there's more detail in this document:

http://developer.echonest.com/docs/v4/_static/AnalyzeDocumentation_2.2.pdf

And for all the details, feel free to dive into Tristan's thesis:

http://web.media.mit.edu/~tristan/phd/dissertation/index.html

Reply to this Thread

You must log in to post a reply.