For lots of beginning users, the combination of frequency domain signal processing (using the MSP pfft~ object) and gen~ comprise a double-whammy of anxiety.
There are certainly online resources for those curious in the workings of frequency domain processing, and tutorial resources for the gen~ beginner, but what happens when you try to make sense of them at the same time? In this tutorial, I'm going to try to help you make sense of frequency domain processing in gen~ with an overview and a few examples to encourage you.
The Basics of FFT processing in Max
The standard operating procedure for FFT processing involves a regular rubric:
- An fftin~ object that takes audio input and provides real and imaginary components of the audio signal, along with a bin index output.
- We send that information into our patch where a cartopol~ object converts the real and imaginary inputs into amplitude and phase information.
- (We do some processing here....)
- When we’re done with the processing we’ve got in mind, a poltocar~ object converts our amplitudes/phases back into real/imaginary information.
- We send the result to an fftout~ object and from there on to the outside world.
When we work in the Gen environment, we essentially follow the same procedure, with one exception: we use a gen~ object inside of the part of our patch that the pfft~ object loads.
If you’ve looked at an MSP patch hosted by a pfft~ object in comparison to its gen~-based cousin, they look remarkably similar — the gen~ operators even have similar names to their MSP counterparts (minus the tilde [~] in the object’s name, of course).
But there’s a crucial difference: When we’re doing frequency domain processing using gen~ - as with any other kind of gen~-based processing - we’re operating on one sample at a time rather than blocks of samples as we’re accustomed to in MSP.
Practically speaking, it means that we need to think about things like techniques for counting through stored buffer data when doing frequency domain processing. Many new users find working with pfft~ difficult because it’s signal vector-based. I personally find that working in gen~ is a little easier once you get the hang of it. What you really need to embrace is a single idea: you're counting one sample at a time as you do your processing.
To get a feel for how that works, we’re going to create some gen~ patching loaded by a pfft~ object in MSP. I hope that you’ll find it a little more approachable.
We’ll start by creating an MSP patch that takes live input, writes amplitude and phase information into a buffer, and reads and outputs audio derived from the buffer into which it stores that data, all in the gen~ audio processing domain. Let’s go!
I’ll start by creating a buffer~ object in my top-level Max patch that I'll use to store amplitude and phase data for a sequence of spectral frames. Since I’ll be working in stereo, it’ll be a 4-channel buffer (left channel amplitude, left channel phase, right channel amplitude, and right channel phase).
Note: I could use a data operator inside the gen~ object instead, but I’m using the buffer operator so I have the ability to display the contents of my buffer.
To display my buffer~ contents, I’ll add four waveform~ objects ganged together by connecting their rightmost outlets and inlets. To set the display, all four of the waveform~ objects have their buffername attributes set to the name of my 4-channel buffer (spectral_buffer), and each object specifies the channel of the buffer to display using the chanoffset parameter. Here’s what the display looks like in action:
As you’d expect with FFT processing in Max, we’re working with spectral frames for a given bin size (which, as usual, we set as an argument to the pfft~ object). Each of those frames stores a full spectrum of frequencies from lowest to highest bin. In our example, there are 512 of them.
Here’s the patch hosted by the pfft~ object
It includes the fftin~ and fftout~ objects we’d expect to see in any pfft~-hosted patcher, but we’re going to be using the gen~ patcher called stereo_spectral_record_and_play to handle the recording and playback (We load this gen~ patch using the gen~ object’s @gen attribute followed by the name of the .gendsp file we want to load.
The recording portion of our gen~ patcher is simplicity itself (with one very minor tweak): we take the real and imaginary components provided by the left and right channel fftin~ MSP objects in the hosting pfft~ object, and use the cartopol gen~ operator to convert that input to amplitude and phase pair information, which we’ll be storing in our buffer.
I wanted the act of recording input into my gen~ patcher to be something I could start or restart at will, so this patch includes a really simple + 1 and history operator pair to handle the frame counting chores. The history object functions as a single-sample delay, and I’ve created a simple single-sample counter and the + 1 operator that loops back from the history object’s single-sample delayed output. Giving the history operator a name allows me the ability to treat it as a parameter and restart the count by sending the message recpos 0 to the pfft~ object that hosts the gen~ patcher.
Writing the spectral data into my buffer in gen~ is handled by the poke operator, which takes the name of the buffer we want to write to as an argument (in this case, the 4-channel spectral_buffer buffer~ object). Each of the four poke operators includes an additional argument that specifies the channel of the buffer I want to write to. Channels are numbered from zero, and correspond to left channel amplitude (0), left channel phase (1), right channel amplitude (2), and right channel phase (3) respectively.
The result of this patching is 01_spectral_record_and_play.maxpat - an MSP patch that will record and display FFT information. Whenever I send the message recpos 0 to the pfft~ pfft.record_and_play 1024 2 MSP object in my parent patch, it records as many frames as it can into the buffer~ object - all I need to do to alter the number of those frames it to change the size of the buffer, in fact.
So I now have a buffer that contains a collection of spectral frames for my stereo input, each of which contains amplitude and phase information for each of the 512 bins in my FFT.
Whenever I send the message to start recording to the pfft~ object in my parent patch, it records as many frames as it can into the buffer~ object. Playing that information back in gen~ will - as you might expect - involve using the peek operator to access my spectral_buffer operator. There's just one little thing I'm going to need to figure out: How many spectral frames of data have I recorded into my 4-channel buffer~ object, anyway?
What the buffer really contains is as much data (as many spectral frames) as it can hold, and that data is stored as a sequence of 512 values that correspond to the first frame of spectral data I recorded, followed by another set of 512 values, and so on.
To do that, I’ll need to know how many frames the buffer~ object I’m writing into can hold, too. I’ll also need to have a way to locate the starting point of each frame written into the buffer as well as to be able to count through all 512 data points in order. So I have a little gen~ patching to do.
It’s not really that difficult, but it is important to understand how the spectral playback portion of the patch works. I’ll break the patch down into smaller pieces (and you’ll find that the patches that I’ve included with this tutorial are commented as well, so you won’t need to come back to this tutorial over and over to understand things unless you really want to).
Just as we did when we recorded input, we’re going to use a simple counter to drive our per-sample calculations and operations. The starting point for our counter this time is a gen~ accumulator (+=) operator. When we wrote input data to our 4-channel buffer, we used a pair of operators - + 1 and history - which worked great for counting at a rate of one sample increment per sample.
But in this case, we’re using the accumulator in combination with a named param operator with a default value of 1 as our counter. As you might imagine, the name of the parameter should give you a clue as to what’s up. The value of the playrate parameter is used as the increment for the accumulator, so we can use this to alter the playback rate for our gen~ patcher.
We’ve got a way to count, but there needs to be a limit to the count - the playback index needs to count to the number of spectral frames (that collection of 512 values) and then wrap back to zero. For that, we need to figure out precisely how many spectral frames we’ve stored in our buffer~ object.
The buffer operator’s left outlet outputs the number of samples that the named buffer holds. We can grab a value for the number of spectral frames we’re using from the second outlet of the helpful fftinfo operator (this is a great object to get to know, since it also outputs other useful FFT info on frame and hop size for the FFT, whether we’re in full-spectrum mode, and the FFT offset). Dividing the size of the buffer by the frame size will tell us how many frames will fit into our buffer.
That number needs to be an integer, so we use the floor operator to lop off the stuff to the right of the decimal point. Multiplying that value by the frame size (using the fftinfo operator once again) gives us the number of spectral samples in the buffer, which we send to the right (upper bound) inlet of the wrap object. The result is a counter that gives us a playback index that will count to the number of spectral samples and wrap back to zero.
In order to play through our buffer, we now need a way to fetch values by finding the nearest spectral frame offset in the buffer, and then get the specific position in the buffer to read based on an offset value to that frame index.
We’ll use the standard third-outlet output from the MSP fftin~ object in the pfft.record_and_play patch to give us the bin index. That corresponds to in 3 and in 6 in our gen~ patcher. Since we’re using this just for the bin index and we’re not trying to do anything fancy like play the two audio output channels back at different rates, we’ll just use the outlet of in 3 @comment L_bin operator for the bin index. We’re using that bin index in two places: First, we check to see if we’re at bin 0 using the == 0 operator. A value of 0 means that we’re in the first bin of a new spectrum so we need to update the spectrum index - we update the playback index we just calculated by using the latch operator as a sample and hold.
To get the nearest spectral frame to the output value, we’ll use the round operator. The round operator sets the value we want to round the input to as an input, and we’ll make use of the fftinfo operator once again to grab the frame size using the second outlet of the operator.
After that all we need to do to get the address in the buffer to read from is to add that offset value to the current bin index (which, again, we’re grabbing from in 3 @comment L_bin’s left output.
The rest is easy - we use that buffer address as the sample index from each of the four channels of our buffer~ we wish to read.
Here’s a nice bit of patching that both make a neater and more efficient patch for you. You could certainly use four peek operators and then set the channel for each read using a constant value as input to the right inlet of the peek operator as shown here (or specify a channel for each read using the @channels attribute for each operator to make sure we’re getting the right amplitude and phase values):
It’s neater and easier to use a single peek operator with the @channels attribute set to 4 (for two pairs of real and imaginary values) and output the four results to a standard pair of poltocar operators (one per channel of audio) and then output those results using out operators.
The result of our patching is a gen~-based patcher that can record stereo audio into a fixed-length buffer~, allow you to start recording at any point, and loop back through the results at a variable rate of speed.
Spectral Processing by The Numbers
Now that we have a sense of what working with spectral data inside a gen~ object looks like, let’s take a look at a couple of simple spectral modifications - zeroing out the output of spectral bins using a threshold value, and spectral bin rotation. Both of these processes are really simple to do once you understand what's going on inside your gen~-based patch, but the results sound way more complex and satisfying.
The 03_spectral_bin_zeroing.maxpat MSP patch operates on a simple principle: we set a threshold value that sets a "noise floor" for the output of each spectral bin in our FFT in turn, and only pass the results for spectral bins whose amplitude exceeds that value.
Now that you've worked through our initial example patch, you'll probably be able to guess where we'll be adding our patching: at the very bottom of our patch, where we grab the amplitude values for each bin and pass the results on to the poltocar operator.
We can add just a few operators to our patch to zero out the bins:
- A single param operator (param thresh 0) initialized to zero that we'll use to set the threshold value
- A > (greater than) operator that uses that parameter value for testing
- Based on the result, we add a ? (conditional) operator that uses the boolean 0 or 1 result of the > operator to either pass the amplitude value unchanged or set the amplitude output value to zero if the amplitude does't exceed the threshold.
Passing the amplitude value or zeroing it out is a simple matter of routing the amplitude value to pass to the second input (true input) of the ? operator or using a constant operator (0) to set the right-hand (false input) input to the ? operator.
That's all there is to it. You can now gradually spectrally "submerge" your input on the fly, leaving only a batch of ghostly whistling or plinking....
Bin rotation is another relatively simple operation whose results are pretty cool. As you'll recall, our initial patch does its work by calculating spectral frame indices - the place in the buffer where any given set of 512 amplitude and phase pairs start. Once we have that number, it's merely a matter of taking the bin index value (which is provided courtesy of the third outlet of the fftin~ object in our pfft~-hosted patch) and adding it to the spectral frame index to get the place in the buffer~ where our phase and amplitude pairs live.
The bin index values we get from the fftin~ object count over and over in a loop that wraps at the number of spectral bins. Just pause for a moment and think about what would happen if we modified the current bin index we're reading from to another number - a bin value higher or lower than where we'd normally be using.
The result is that we'd be grabbing amplitude and phase pairs for higher or lower frequency bins for the spectral frame we're processing. And while the result does resemble a change in pitch, we're actually changing the spectral components for each bin, and the result is (to quote Shakespeare) "something rich and strange."
That's what the 03_spectral_bin_rotation.maxpat MSP does.
Let's start with what you already know from the record/playback example:
As before, we use the standard third-outlet output from the MSP fftin~ object in the pfft~-hosted patch to give us the bin index. That corresponds to in 3 and in 6 in our gen~ patcher. We use that bin index in two places:
- We check to see if we’re at bin 0, and - if so - we update the spectrum index by using the latch operator as a sample and hold.
- We add the current bin index to the spectrum index and use the result to grab amplitude and phase values from the buffer operator using a peek operator.
Performing bin rotation is simplicity itself: Instead of feeding the output of the in 3 operator directly to the + operator at the bottom there, we'll add a new param object that specifies the amount of bin rotation we want to perform (and use an argument to set an initial value of zero), add that number to the current bin index, and then add a wrap operator whose upper bound is set to the number of bins (which we get from the second outlet of the fftinfo operator) so that our FFT bin values will wrap at 512.
That's all there is to it. Considering how easy this was to do, the results sound pretty great.
Creating a Spectral Delay
We'll wrap up this introductory tutorial with two more examples of spectral processing in action — a pair of spectral delay patches. Let’s take what we already know and create a simple spectral delay using the gen~ delay operator.
We’ll do some filtering in addition to spectral delay. The 04_simple_spectral_delay.maxpat MSP patch contains a simple example of a spectral delay that also features a little spectral filtering that borrows a page from the Forbidden Planet FFT patch.
Our spectral delay patch uses the contents of a buffer~ object called filters to “filter” our spectral input by scaling the amplitudes of the 512 input frequency bins before we get down to delaying.
Since the buffer~ named filter contains the same number of samples as the number of bins in our FFT, we can use the third outlet of fftin~ object (which outputs the current FFT bin index) for our filtering. We use a peek operator to grab FFT bin index values for the buffer, and use those values to scale the amplitudes per bin by multiplying the amplitude output from the cartopol object by the peek operator’s output using the humble but efficient * operator.
Some well-known spectral delay plug-ins let you draw in your own delay time curves. Our simple spectral delay allows two different techniques for setting filtering (and, soon, delaying) values:
We’ve been using the Max waveform~ object in this tutorial to help us display buffer~ contents. But it has another trick, too - we can also draw directly into the display by using the outmode attribute in the object’s Inspector to mode 4.
In addition, we can make use of the function-based messages to fill our buffer~ object with values. The filter and delay buffer~ objects have a number of message boxes attached to them that you can click on to explore the fill and apply messages (for more on these messages, see the buffer~ object help file’s functions tab).
We can load a buffer with functions using the fill and apply messages to the buffer~ or click and drag to draw into the display to alter the contents of the buffer~ objects we use. It’s the best of both worlds.
Now that we’ve got some interestingly filtered input, let’s add some delay.
To refresh your memory, here’s a simple example of the delay operator in gen~ in action:
The difference between these simple examples and what we’re doing in the frequency domain is is that we are setting the amount of delay by some number of spectral frames (that is, the collection of 512 samples that correspond to amplitude or phase data in your buffer~ object), and then performing the delay calculations on each frequency bin associated with that frame.
Each spectral frame contains 512 values in each of the four channels of the buffer~ object, two of which represent the amplitude of the sine wave associated with the bin, and two of which represent the phase of those sine waves.
In terms of our spectral delay, we can set a separate delay time for each of the bins using a second buffer~ object called delays.
To set our simple spectral delay up, we need to translate what we think of as delay time from seconds to an amount that corresponds to a spectral frame value (which should be an integer).
Once again, we use the third outlet of the fftin~ object (by way of in 3) to grab the current FFT bin index and use the peek operator to fetch the associated value from the delays buffer~ object. We multiply that value by the current sample rate (which is a constant we can use in gen~ as part of a calculation, and divide that by the spectral frame size (which we get from the fftinfo operator). The ceil operator (used to guarantee a minimum and a final multiplication by the spectral frame size (again, from the fftinfo operator) quantizes the delay value to the frame size, and that’s the value we use to delay amplitudes and phases per bin.
You may be wondering about the use of the vectorsize constant as an argument to the delay operator here. The value of the vectorsize in a pfft~ is the same as the FFT size, so a delay of that number of samples achieves a delay of exactly 1 spectral frame (It can be used a bit like a history operator, but one that operates at the spectral frame size) — it creates a frame delta operator that returns the difference between the current and previous frame (per bin).
The observant reader will also notice two more interesting details in this part of the patch, which I’ll briefly describe.
First, the delay operators are using the @interp none attribute. Turning off the interpolation for the delay operator (which, by default, performs linear interpolation) saves us some CPU, since we’re only ever going to be using integer values for indexing.
There’s one more curious part of this patch - what’s going on with the phase portion of the outlet patch from the cartopol operator? It’s another case where we’re making use of the spectral frame size as a value, but this time it’s being used as a delay value….
The answer lies with the nature of the phase data we’re working with. Phase values are cyclical in the range of -π and π, and they’re understood to wrap when they exceed that range. This isn’t a problem working with amplitude values from the left outlet of the cartopol operator, but we need to be careful to observe that wrapping with phase values. The solution is to calculate and store not the phase values themselves, but rather their deltas - the amount of change from one frame to the next. The first delay operator does just that - it subtracts the current phase value from the previous one and stores the change in phase rather than the value itself (In non-gen~-based FFT processing, there’s an MSP object that does just this called framedelta~).
After the per-bin delay operations have completed, we add the delta value to the previous delay value we had courtesy of another delay operator and use the gen~ phasewrap operator to handle wrapping phase values outside of the range of -3.14159 - 3.14159 (-π and π). That value is then delayed the length of one frame and used next time. The result is a smooth and proper set of transitions for the phase values.
The combination of filtering, delay, and the ability to load or draw exotic patterns for them yields some interesting results.
The combination of filtering, delay, and the ability to load or draw exotic patterns for them yields some interesting results.
Adding a Little Feedback
The final patch in this tutorial is 05_spectral_delay_with_feedback. It takes what we’ve learned and adding the ability to use a drawable/settable buffer~ and gen~ patching to provide feedback to our stereo delay line.
The internals of the gen~ patch for the stereo_spectral_delay_feedback.gendsp file differ very little from the previous gen~ patcher. The filter buffer~ is gone, replaced by a buffer~ object to hold feedback values (with the requisite peek operator driven from in 3 to grab feedback values). We made the patching a little neater by moving the framedelta~ and frameaccum~-like patching to gen patchers, but it’s mostly a case of adding the multiply and feedback path.