Separating an audio signal’s pitched and unpitched frequency components

Ben Vining

Hello everyone,

I am trying to work on a way to separate an audio signal’s pitched and unpitched frequency components, so that I can process them separately in real time. Ideally, something within an FFT would be amazing.

I have read up on the subject, and I’ve found lots of very advanced mathematical algorithms that can be done with the amplitude values of each frequency bin, and the thing that concerns me about trying this approach is that I believe you would have to write all your amplitude values to a buffer~ before you can do your analysis, which to me sounds like a nightmare to try and write, compute, resynthesize, rewrite, etc in real time without things getting mucked up.

The end goal is to apply this analysis technique to the modulator input signal of a vocal harmonizer, that way I can make sure the only frequencies being shifted are the pitched portions of the signal, and I can process the noise portions differently.

So I’m wondering if there is a simpler way to do this, for my desired application. Is it a job for gen~? Does anyone have any experience working with this kind of thing? 😁

double_UG

look at the "MuBu for Max" Package
http://ismm.ircam.fr/mubu/

Ben Vining

This paper proposes finding a signal’s component sinusoids, then determining if each sinusoid’s frequency is within the harmonic series of the sound’s fundamental frequency, to classify each component sinusoid as “pitched” or “unpitched”, then adjusting every individual frequency bin’s amplitude accordingly.

Ross’s suggestion of using vectral~ sounds like it may work — what about using vectral~ to find/track the signal’s component sinusoids as a list of resonant frequencies outside of fft~? Then you just compare each item in the list to the harmonic series for the signal’s fundamental frequency, to assign each sinusoid a 1 or 0, for pitched or unpitched. Then, inside the fft~, each frequency bin receives a 1 or 0 depending on which sinusoid that frequency bin is part of, and each bin’s amplitude is adjusted accordingly.

Or am I on the wrong track here? It seems like the biggest difficulty would be to keep things in sync between the identifying of the sinusoidal components, analyzing all of them, and inside the live fft~ where you need to adjust bin amplitudes in real time.

https://pdfs.semanticscholar.org/0dd1/a95ea03323db07126aa16f6dd8d7280a1a8b.pdf

Jean-Francois Charles

Note that the technique described in the paper you mention applies only to harmonic signals, not to all "pitched" signals. Pitch is commonly heard even with non-harmonic signals, i.e. signals for which the different sinusoidal components are not integer multiple of a fundamental frequency. Many bells present an illustration of this phenomenon.
That being said, you mention [vectral~], which is a "vector based envelope follower", often used as a "per-bin low-pass filter" inside of [pfft~]. I'm not sure how you would use it for your purpose.
Another idea is to use a denoiser. It will not look at "pitch" but might give you a first approximation of the effect you want to achieve if you are working with recordings where the components of the pitched sounds are greater than the noisy components. Here is an extremely basic denoiser that you could build upon.
Save as denoiser.pfft:

Max Patch

Copy patch and select New From Clipboard in Max.

Main patch:

Max Patch

Copy patch and select New From Clipboard in Max.

Source Audio

Some time ago I made 3 voice harmonizer using
ircam ftm objects.

Max Patch

Copy patch and select New From Clipboard in Max.

Sounded quite ok for live usage.
This was made with max 6 and ftm library 2.6.0 from 2012
I remember each major ftm update broke some functions,
so it might need some tweaks if other lib version is used.