Improve definitions of fundamental frequency?
Hi,
I'm thinking about how to improve the accuracy of determining the fundamental frequency / pitch / key note of one shot samples (not in real time).
1) One idea is to track the accumulation of frequencies in the lower range, like a sonogram (usually these are the red zones), but I don’t understand how fft~ works and is this possible with its help?!
2) The second idea is to pre-process the signal with something like squash compression to obtain a more linear and stable spectrum, perhaps this will help improve the result.
It's worth mentioning that for now I chose fzero~ because in comparison with other (external & native) objects it shows the most accurate results.
Perhaps someone has some good ideas?
f0 analysis is quite a big topic and a lot of research is made on it. You can read here about 3 different algorithms, but there is a lot more. Depending on the nature of your source sound (monophonic? polyphonic? human voice? ...) you might want to use different algorithms.
When it comes to monophonic human voice pitch detection, I've had very good results with the yin analysis that you can perform with the MuBu package (both in realtime using the pipo module or off-time on existing buffers with [mubu.process]).
Having non distorded signal with good output volume helps, so compression might give better results in some circomstances, at the expense of more inacuracy in energy/volume/velocity analysis, if that information matters to you.
one thing is sure, FFT is not precise enough.
other than that, a few random thoughts:
- an offline algo might look quite different from a realtime algo
- dont underestimate the usefulness of frequency shifters and bandpassfilters prior to the actual analysis - and in conjunction with that, human interacting in the process (can you say "half automated process"?)
- don´t underestimate how much congnition can still differ even from the most clever algorithm. if there is a sound consisting of three partials of 40% 50 Hz, 40% 75Hz and 20% 100Hz, your brain might tell you that its "note" is 100Hz. (a completely made up example, but you see where it is going)
and what do you want the analysis to do for sounds where you brain tells you the sound event is a "chord"? what is for sounds where you brain will tell you that the fundamental is changing constantly? or abruptly? or is non-tonal?
My task is to make a device that recognizes the first (usually stable) harmonic, since I'm going to use this for single-note bass sounds, and kick drums.
A kick drum is going to be tricky, because if you mean an electronic kick drum, the fundamental algorithm is a pitch drop. People used to make them by putting filters into self-oscillation and having the filter envelope do the drop. (I did a show with a friend who used an entire SH-09 for a kick, it was quite something!)
Which is to say... what is the fundamental of a kick? The front? the back? the average spectral centroid?
I would suggest spending some time mucking about with Sonic Visualizer and its plugins which will help you figure out which algorithm gives you the results you want before going to the trouble of programming it in Max. It's free and really powerful, tons of advanced music information retrieval techqniques available as add on plug ins. As in, you could sing the note you think it is and see which algorithm gets you the results you intuitively think of as the pitch of the drum.
Might also be worth reading some acoustics chapters on pitch - for complex sounds fundamental frequency is not necessarily telling the whole story. We used Rossing "The Science of Sound" and you can get it used quite cheaply. TLDR: "damn this gets complicated in a hurry", lol
hope that helps!
I don't think it's worth making it so complicated. As a rule, the tonality at low frequencies is the strongest frequency. In my case, I just need to find the frequency that accumulates the most energy.


What you're showing is the spectral centroid at some bin of time, not the fundamental. Real percussion doesn't have a fundamental frequency by definition, because it's non-periodic (i.e., non harmonic) - a specific waveform in the timedomain never repeats. A synthesized kick is different - it could park on a discernable pitch of the sine wave or not. One can twiddle the knobs on an 808 or 909 and make the kick go from a discernable pitch to something for which we could not sing a pitch. Spectral centroid is what you hear as pitch when you have drums where you can say "that one is higher than this one", but if I asked you to sing the note you could not. (In the acoustics realm we say that fundamental is a component of pitch but not the whole story.)
The point people are making here is that it's not as simple as you think - an algorithm meant to be used for a harmonic sound may or may not work on a non-periodic sound, especially one with low frequencies where the length of a waveform is in the same ballpark as the length of the amplitude envelope and where the pitch is moving over the course of the sound (like a kick).
If this algorithm works for you, great. But as a programmer of many years experience, I personally wouldn't want to start coding until I knew the algorithm I was going to implement was going to give me the results I want.
Pitch extraction of percussion is notoriously complicated (actually information retrieval from it at all). If something intended for pitched instruments works you got lucky, but if that's all you need (ie you know the sample will always be the one you tried) then maybe that's fine.
I guess for labelling a big collection of kick drum samples with their "pitch" you COULD look at the tail only, and then do spectral centroid on that part only. Like, say you only scan through the last 50% of each sample.
Yes, there'd be noise etc, but tons of kick drums ARE tonal to some degree. You could perhaps use the noisiness parameter as a sort of "confidence" cutoff.
You could leverage some simple flucoma to scan full directories? Dump the list to text, etc.