I would be very keen to be able to get simple data on the presence and nature of vocals in tracks. I appreciate this is not a wholly simple task but given there are apps that 'remove' vocals from tracks and of course voice recognition software I suspect some rudimentary data could be garnered.
I guess most important would be some identification/description of 'vocal/non-vocal sections' in the track , variance and repetition etc. This would come in handy for classification, auto mixing,and no doubt plenty of other clever things
Finally a totally off topic question but I've been away from the nest for a while - the last time I was here the full XML was available for a track once processed - is this still the case? It would (with the apps I haev in mind) be preferable/less resource heavy to simply store/cache the entire analysis and parse it for my apps than call seperate methods
Posted: 2009-07-20 11:13:09
Thanks for the suggestion. We'll certainly add it to the list. Note however that vocal segment detection is still very much an open area of research. State of the art research system achieve about 80% accuracy for classifying segments as to whether or not they contain vocals.
As for the analysis XML - we no longer support the full XML - we found that most people only wanted certain parts of the data so shipping around all of the extra bits was slowing them down. For instance, if you just wanted the tempo of a track you still had to get the 0.5mb of data.
If you are using the Java or the Python client libraries, they'll do the local caching for you - it should be pretty efficient. If you are doing something that you think is not well supported by our clients or interface, please let us know.
Fair enough (80% don't sound bad to me) - even a vocal content probablity would be useful - personally I what I define as a vocal/voice is probably easier to detect than what the average definition might entail - I would qualify drum solos and sax solos as vocals whereas repetitive audio samples of brief length would not qualify (I suppose I would say there are words, phrases and sentences - the latter only truly qualifying as a voice - the others merely being the voice behaving as an instrument - if that makes sense). Indeed thinking about this has made me wonder how close the 'sections' in the API come to identifying potential vocal areas...perhaps even the current information garnered might even act as good probablistic guide...