RNBO Phase Vocoder Troubleshooting

toydrones's icon

I'm trying to make a real-time phase vocoder in RNBO like the ones in The Phase Vocoder – Part I tutorial. I'm not out of the woods yet, but I want to share what I have so far and get all the feedback I can.

PVOC with polar math

The first example patch from the tutorial kind of ports with some modifications to compensate for RNBO's lack of framedelta~ and frameaccum~, pictured below:

Abridged RNBO adaptation of a Phase Vocoder that cleanly resynthesizes the original signal provided there is no timestretching

Note that this is a crude workaround. While it can resynthesize the original signal as cleanly as the tutorial patches, it produces unexpected pitch/formant/phase artifacts when timestretched even a little bit , and these artifacts are notably stronger than those in the tutorial. I assume they stem from my framedelta~ and frameaccum~ workarounds. I tinkered with this patch for a while to no avail and eventually decided to try the same thing without polar coordinates.

PVOC with cartesian math

The tutorial's cartesian PVOC example uses some pink send~ and receive~ pairs in what would be an infinite loop outside the pfft~ setting:

Notice how the pink send and receive pair in the tutorial patch seem to create an infinite loop, yet the pfft~ handles it with no issues.

Sure enough, attempting this scheme in RNBO wouldn't compile at first because of the infinite loop. It managed to compile with a feedback~ thrown in, but the resulting patch only produced harsh clipping. Delaying the feedback loop by fftsize or hopsize created the same clipping, which persisted even with the feedback multiplied by decimals in the 0.1 - 0.4 range.

This attempted port of the cartesian phase vocoder example only produces harsh, undesirable clipping

I did at least get a cartesian pvoc to cleanly resynthesize the signal, though not without the timestretching artifacts of my other attempt.

This cartesian phase vocoder exhibits the same problems as the polar phase vocoder

TDLR: I'm stumped.

If I understand correctly, the pfft~ object delays its processing by some multiple of the fftsize so that it may read and write to a frame simultaneously, which results in its odd/lovely ability to accomodate infinite loops. RNBO, on the other hand, does not perform this automatic buffering, which may or may explain its lack of a framedelta~ and frameaccum~ objects.

To me, this all suggests a subtlety in RNBO's fft~ handling not mentioned in the documentation. Could anyone help me better understand RNBO's fft~ object and how I might account for it in my phase vocoder project? Here are the patchers for reference:

Rnbo-PhaseVocoder_Project[1].maxpat
Max Patch


Thank you for reading this far.

- Sam