|
Posts: 90
Registered: Aug 29, 2008
|
Posted: 2010-10-06 13:12:11
|
Hi all,
When I get top_terms, I receive frequencies associated with those terms.
When I get an artist's terms, I receive frequencies and weights associated to each term for those artists.
Why do I receive a different frequency for a term in the overall context compared to the artist-specific context?
|
|
|
Posts: 914
Registered: Sep 08, 2008
|
Posted: 2010-10-06 13:20:21
|
Term frequencies are normalized, so when you get an artist term frequencies, you are getting the frequencies normalized for that artist. If 'britpop' is the most frequent term for the artist, it will get a 1.0.
When getting the top_terms, these are normalized against the whole set of terms. Since 'rock' is the most frequently occurring term, it's frequency will be 1.0.
Hope this helps.
Paul
|
|
|
Posts: 90
Registered: Aug 29, 2008
|
Posted: 2010-10-06 13:34:36
|
Paul,
Thank you, that's very clear.
Is there any way of getting a corpus frequency for a term that's outside of the top 1000? For example, I see that Sufjan Stevens gets a 'folk' term, but that term is outside the top 1000, so there's no overall sense of frequency...
|
|
|
Posts: 914
Registered: Sep 08, 2008
|
Posted: 2010-10-06 13:54:06
|
atl, sorry, we currently only expose the top 1000 terms. However, it is surprising that 'folk' isn't in the top 1,000. I think 'folk' may be getting caught in one of the filters that we have to make sure that only musically relevant words are exposed as terms. Let me check on that.
Paul
|
|
|
Posts: 90
Registered: Aug 29, 2008
|
Posted: 2010-10-06 14:14:32
|
okay, thanks. I'm fairly sure I can find a way around it.
Yes, 'folk' stood out on a spot check, especially considering 'folk metal', 'folk music', 'folk punk', 'folk rock', 'folk-pop', and 'folktronica' were present in the top 1000. :)
|
|
|
Posts: 90
Registered: Aug 29, 2008
|
Posted: 2010-10-06 14:57:04
|
Also, for reference, and to help the knowledge along, "pop" is also missing from the top 1000.
|
|