Forums » General Discussion Search

Increasing audio sample size - false positives New Reply

Author Post
Posts: 5
Registered: Jul 11, 2012

What is the correct way to increase the audio sample size with both fastingest.py and lookup.py?

I'm getting false positives matching Radio Edits with the Original versions of songs. I can be looking up a "clean" version of a song and get a "dirty" one matched. And vice-versa.

I would like to match as much of the song as possible. Is 2 min sample possible? What happens if the track is shorter than the sample size?

I finally got TTyrant and Solr up and running and stable - I just need to solve this last piece. Thanks.

Posts: 197
Registered: Sep 05, 2008

I assume you are talking about Echoprint. Echoprint is not meant to delineate between radio edit & original nor explicit vs. clean. It attempts to identify a "song" and we do not consider those false positives. If you want radio edit vs. original, you can post filter by the duration of the match. I don't know of any audio-based fingerprint that can do explicit vs. clean versions-- you'll have to rely on other sources for that.

Please read out FP FAQ: http://notes.variogr.am/post/27796385927/the-audio-fingerprinting-at-the-echo-nest-faq and join the Echoprint group for more discussion: https://groups.google.com/forum/#!forum/echoprint

Posts: 5
Registered: Jul 11, 2012

Thanks for the quick reply.

But is it possible to increase the audio sample size? We were also interested increasing the accuracy of finding matches in general. We are running at about 85% success on matching .mp3 files to .mpa. Meaning 15% of the time nothing is found from a lookup. Is it possible to have the code look at 45 or 60 seconds?

Thanks again.

Posts: 197
Registered: Sep 05, 2008

You can have the codegen compute as much as you want. The default is 30s but that can easily be overriden. Check echoprint-codegen's usage.

Posts: 197
Registered: Sep 05, 2008

I checked & the default duration is actually the whole file, not 30s. So I'm not fully sure what you are asking? The query logic doesn't care how long the codestring is, it will use as much code as you give it.

Posts: 138
Registered: Feb 07, 2012

bwhitman: The query logic cuts the code string down to 30 seconds in length. Is this what you mean?

Posts: 197
Registered: Sep 05, 2008

Oh yeah. forgot about that. Isn't that 60s queries though?

Posts: 138
Registered: Feb 07, 2012

mjlefevre: A success rate of 85% seems pretty low, especially in relation to the evaluations I have run, where a typical result is at least 95%, for the matching the "same" clean audio, i.e., as Brian explained.

The query length is capped at 60 seconds in the open source distribution of the server, which we have found to provide more than enough hash codes for high quality matching. See here. Normally we recommend 20-30 seconds, but the server can use up to 60 seconds to make the match.

Posts: 138
Registered: Feb 07, 2012

bwhitman: Yes, you're right. The server caps the length of the queries, and the cap is 60 seconds (not 30 seconds as I stated earlier - I was thinking of something else).

Posts: 5
Registered: Jul 11, 2012

What happens if you increase the 60 second cutoff? Go past the end of a song?

Posts: 138
Registered: Feb 07, 2012

mjlefevre: If the cutoff is greater than the length of the query (which, for normal use it is anyway, as 60 seconds is greater than the typical 20-to-30-second length of queries) then nothing really gets changed. Its main function is to safeguard against the CPU and IO load exploding from making gigantic queries.

Reply to this Thread

You must log in to post a reply.