|
Yes, timbre[0] is in "dB", yet on a positive scale, and after going through a bunch of auditory filters. It's not really possible to accurately convert that information back into conventional power from the information provided, although it may be possible to approximate it somehow, given a few arbitrary assumptions in trying to invert the auditory filters. But why exactly would you want to do that anyways? There's no real loss in the pitch description as it is.
The reason why pitch values are normalized is to have them represent pitch content only, so they are comparable with each other. If you returned values without any normalization, then you'd represent both pitch and a form of magnitude all at once, even though loudness is already well described in its own way, as meaningfully as possible. You could keep unfolding like this, and wonder why only 12 pitches. You can just as well describe 88 pitches, and incorporate a bit of timbral information into the mix. But then why quantize them at all when you can return accurate partials both in time and frequency, directly out of the auditory spectrum.
Sure, eventually, I'd like to make these quantities available to developers via API. But for the time being, and for many reasons, it is a lot more convenient and digestible to deal with canonical vectors that represent one simple concept only. Even using timbre[0] for average loudness is quite a bit of a stretch that will likely get revised in the future. Thanks your input: the more demand there is around a certain feature, the more quickly it'll get updated.
|