Sample accurate audio processing in MAX/MSP and gen~
In the context of doing an audio controlled synth I'm interested in "real time audio processing", or "sample accurate processing", respectively. gen~ is supposed to do this, while the MAX/MSP environment is vector based (i.e. audio samples are first piled up in a buffer/vector and only processed once the buffer/vector has been filled up).
Since gen~ is embedded in the MAX/MSP environment I was wondering how "real time" the combined MAX/MSP/gen~ system really is.
(Q1) How are audio samples from the audio interface fed into gen~?
(Q1a) Is there some sort of a direct path between the audio interface and gen~ that bypasses the vector based MAX/MSP environment?
(Q1b) Or does gen~ only provide the processing of individual samples that have been buffered by the MAX/MSP-environment in the first place?
(Q2) What about several gen~ objects that are patched together in a MAX/MSP patcher?
(Q2a) does the second gen~ object get individual samples processed by the fist gen~object immediately, or are the samples buffered by the MAX/MSP environment?
(Q3) What role does the underlying OS play (Windows or MaxOR)?
Does anybody know where I can find a reference/documentation/best practice guide that clarifies these question?
I am not a software engineer but I know that for all audio softwares you always need a buffer for the audio.
In my experience you can already have realtime in Max with modern computers.
With my Motu audio interface and a buffer of 64 samples at 48khz I get a latency of 3.43ms. That is the time that the sound takes to travel 1,14 meters. (speed of sound: 343 m/s at 20°C)
Filter and Eqs will always have some added delay because it's part of the nature of they are done (sample based feedback networks etc.). A biquad object for instance is already programmed in sample based environments. This delay would take just a few samples.
It this an interesting question anyway if using Gen would be even more efficient. I hope some other users here can give you better and more detailed answers to your questions :)
even if you make a whole synth in a single gen~ you will still send control data into it using the scheduler and add one vector of "MSP" when you want to send it to the IO.
(Q1) How are audio samples from the audio interface fed into gen~?
one vector at a time(gen~ then processes single-samples from that vector, collects into another vector and outputs a vector back into MSP)
(Q1a) Is there some sort of a direct path between the audio interface and gen~ that bypasses the vector based MAX/MSP environment?
no
(Q1b) Or does gen~ only provide the processing of individual samples that have been buffered by the MAX/MSP-environment in the first place?
exactly
(Q2) What about several gen~ objects that are patched together in a MAX/MSP patcher?
same thing: vector goes into gen~, single-samples are processed, vector comes out of gen~, goes into next gen~ as a vector, splits to single-samples for processing, comes out as another vector into MSP... and all these vector sizes are dependent on your MSP signal-vector size: basically, whenever in MSP, audio must be processed as a vector... since gen~ always, at the very least(unless you're simply using it for code-generation), needs to be output to dac~(which exists in MSP world), it inevitably must be vectorized again before being sent there.
(Q2a) does the second gen~ object get individual samples processed by the fist gen~object immediately, or are the samples buffered by the MAX/MSP environment?
buffered
(Q3) What role does the underlying OS play (Windows or MaxOR)?
the OS simply hosts the entire app of Max, i can't say specifics on this, but basically, to the point of your other questions: the OS doesn't do anything special to separate gen~ from Max, rather i think the main usage of the OS in this case would be priority-threading(audio is all in a high-priority thread, gen~ along with MSP). to be honest, the answer could be too complex(you could also be asking about how the OS helps split graphics off to the GPU... the OS could be said to play the most crucial role in giving the developers every single functionality of any computer required to make the app... but overall, a majority of Max seems to remain pretty OS-independent(when you code an MSP external in C, it is very much the same process on Windows as on OSX even if, underneath, we might be using slightly different libraries to compile externals on either platform... with a few exceptions if you want to use special libraries specific to a particular OS).
keep in mind, i'm not a developer of Max, these are my best guesses based on using the environ for more than 20 years and seeing how Max gets developed over time, plus writing my own externals for Max in C using the SDK(if i'm wrong, a dev or someone more in the know might come to elucidate further... but i'm pretty sure about the 'vectorizing for/within MSP no-matter-what' part, so it's likely these are the answers to all your questions :D).
hope i got it right, and that it helps 🍻
Super helpful answer, RAJA, thanks a lot!! It confirms in a very understandable way what I was fearing... ;-)
I conclude two points, the first one rather obvious, the second a bit surprising (to me):
if I want to build a super low latency audio app (or at least an app that allows me to control the latency end-to-end), I need to use a dedicated (DSP) system (such as arduino, audiolino etc.).
Although gen~ allows to process individual samples, it is caught in the buffered environment of MAX/MSP. For whatever reason I may use gen~, it is not for reducing latency. The only way to reduce latency in MAX/MSP/gen~ is to reduce the buffer/vector size in the audio settings of MAX, or in the audio driver, respectively.
If you take all your little gen~ patches, save 'em as .gendsp files, and gang them into a single gen~ external that makes use of audio inputs at signal rate (which are processed at single-sample rates) rather than param/subparam operators to feed the gen operators inside your patch, I'm a little puzzled as to how latency is going to be an issue... everything inside the gen~ object runs at single-sample calculation, which is going to always be more efficient than a bunch of MSP externals hooked together.
Yep.
For clarification from the OP, the "efficiency" of gen~ is not about latency, nor is gen~ going to reduce your devices IO latency -- that is determined by your soundcard settings just like everything else on your computer.
The efficiency of gen~ is due to the fact that on every edit, the entire patch is JIT compiled to machine code. This means that there are optimizations the compiler can make **across** operators. These are optimizations which can't be made across objects in an MSP patch, thus gen~ implementations may be able to have lower (sometimes significantly lower) CPU usage. I guess that might mean sometimes you could get away with smaller IO vector sizes for the same algorithm.
The "sample accurate processing" and single-sample processing of gen~ refers to the fact that the entire contents of a gen~ patch run as a program on each passing sample. This has some significant implications:
1. it makes it possible to edit or create new algorithms with feedback over a single sample (pretty much all filters and many oscillators), which is simply impossibe in a vector-based envrionment,
2. it makes it easier to precisely align events on specific sample frames (for e.g. granular synthesis etc.), including (via codebox) complex if() and for/while() kinds of processes that can be difficult to achieve via patching.
Raja already answered but just to reiterate:
Q1: the same way as any other MSP object.
Q1a/b: Not within Max. But see A1 below.
Q2/a: Different gen~ objects in a Max patcher will be buffered just like other MSP objects, but as Gregory says -- if you put all the gen abstractions inside of one gen~ object, they will be unbuffered and single-sample together. They might also be more efficient.
Q3: Not sure what the question means. The OS audio settings are intefaced in Max via the Options / Audio Status menu, and apply to all audio in Max.
Think of gen~ as a special MSP object that has its own single-sample JIT-compiled world inside, but behaves like other MSP objects on the outside.
For your follow-up questions
A1: Yes. But you can design an algorithm in gen~ and export it as C++ code (send "exportcode" message to gen~) and then you could embed this in another program or dedicated hardware, presumably not running an operating system, for ultra-low latency. This is exactly what Oopsy does for example to export code onto the Daisy embedded system, where it can run at blocksizes from 48 down to 1 sample, and samplerates up to 96kHz, which translates to between 1ms down to <100microsecond hard real-time latencies.
A2: Yes.
@gregory taylor: the issue with latency is related to my application. I'm a bass player and I'm experimenting with experimental audio effects in MAX/MSP/gen~ (=second order experiments ;-).
Latency is an issue on bass, particularly with the low notes. One may think that low notes are easy to deal with, because they allow for a lot of processing time (the low B of a 5-string bass is at 31Hz, which corresponds to 32ms or 1423 samples at 44k1). However, just because the frequencies are so low, the percussive attack phase of each note becomes essential for the playing feel and for the hearing perception. A latency in the order of magnitude of 256 or 512 samples (default setting of most audio drivers) kills the playing feel.
Some commercial devices, such as the Boss SY-300 guitar synthesizer, succeed "somehow" to keep the latency low enough (although you can still feel it when playing bass). But the SY-300 is not experimental enough for me... ;-)
Thanks Graham, very useful input, too!
Exactly.
Since you talked about a bass guitar, you could consider a programmable stombox. There are a few options that can support gen~ exported code, such as the Mod Duo (https://www.moddevices.com), the Owl (https://cycling74.com/articles/review-getting-to-know-the-owl-pedal), and the Daisy Petal (https://www.electro-smith.com/daisy/petal ).
I have the petal and was able to export and run a patch at 96kHz and a blocksize of 1 sample, which is 11 *microseconds*. Even at 48 samples the latency is 0.5 milliseconds, which I doubt you can perceive.
HTH :-)
I expect you're familiar wih discussions like this which concern themselves with looking at making more efficient pitch-detection at low frequencies, but hey -
https://dsp.stackexchange.com/questions/411/tips-for-improving-pitch-detection
This is one of those interesting things that our neural wetware appears to do for us pretty efficiently, as well. It's fascinating stuff....
neural wetware
ewww, makes it sound so gross: 'wetware' 😝😂 ...i've never heard that term before 🤣 a good one :)
Back in the early days artifiical life research was categorized according to the three kinds of "wares" it worked with: hard, soft, and wet.
Gregory that article on the hair cells is really intersting.
“A high-frequency cell needs a very rapid excitation to get it to fire. If the excitation is too slow, it will fail. So, basically, the neuron is setting a threshold that says, ‘If my input is not fast enough, I’m not going to fire.’” The high-frequency cell, Burger says, does not want to fire from poor information and miss good information elsewhere. The low-frequency cell, in contrast, “is much more tolerant of slow excitation before it will fire. … These properties, the speed tolerance of excitation between these types of neurons, is extremely statistically robust.”
... This gets to the heart of the team’s central question: Is a high-frequency neuron in the brain determined by its location in the brain or by the type of input it’s receiving? The answer, it seems, is the latter.
I guess working on algorithms in Max for decades has probably changed my cochlear haircells forever.
artifiical life research was categorized according to the three kinds of "wares" it worked with: hard, soft, and wet
🤯 interesting!... possibly there's a 'dryware' then?
...💡! perhaps Terraforming? ...no wait.. maybe something more like a 'virus'(they are not living, but then perhaps that would also exclude them from being considered 'artificial life'?...i'd think anything which can alter life directly for the purposes of replicating itself might actually fall within the realms of said research).... hmmm....🤔 ... 💡aha☝️! perhaps a Zombie-Creating Virus would be 'dryware'!
Attention Earthlings! flee for your lives, gen~ will not reduce latency enough to save you from my 'dryware'! 🧟♂️👾👽
(apologies, was too inspiring a set of thoughts to not go off on a tangent 😅
...and damn, a quick search of 'dryware' led me nowhere but to a site about incontinence products 🥸)
@Graham: Thanks again, I knew about moddevices and owl, but not about the petal. That looks awsome! may not kill my playing feel, but my budget, though ;-)
@Gregory: I'm aware about pitch detection-discussions. At least what concerns traditional pitch-to-midi I've never came across a really satisfying solution for bass (I have tested quite some, although not the new Boss SY-1000). To me, this deplorable situation seems unavoidable, because it takes time to count cycles of low frequencies, or doing fft, respectively. This is why I'm focusing more on the direct processing of audio signals, incl. real time (re)-synthesis (--> the trigger of this interesting discussion). From a creative perspective, it is not so interesting, either, to just play piano samples with a bass ;-)
However, your second link with the "neural wetware" points into a very interesting direction. Analyzing how the ear is built and observing that even if you play bass at 31Hz there is no latency between your fingers and your ear, it seems obvious that the ear/brain produces "perception results" long before the first cycle of the fundamental frequency has completed.
Therefore my hypothesis: If you want to play piano or - much cooler: a minimoog - with your bass, you have to separate an attack phase from the rest of the sound. As soon as the algorithm detects that the player is indeed playing, some output has to be generated to activate the ear/brain. Whether this output is the first part of the sample or a percussive, synthesized sound needs to be explored. (Maybe several sequences of such early signals are required to deal well with the instruments entire range of notes.) These sounds keep the ear (unconsciously?) busy until a pitch has properly been detected and the 'real' sample/midi message can be played in tune.
If I only had time for such a research project... ;-)
They say the latency between an external event to conscious awareness is supposed to be on the order of a half second for most stimuli (a result that is hard to swallow as a musician), but non-conscious responses can kick in much faster than that. But as far as conscious musical listening goes, that's huge buffer size! Entirely different however if you are doing the action that generates the sound, as a whole array of predictive attention grey matter (and other wetware) is ready to detect the expected sound or deviations from it. That is, as human players we may have privileged inside knowledge of what sound is likely to come a few tens or hundreds of milliseconds after we sent the message to move muscles (even if we aren't even conscious of any of this), knowledge that a digital fx box cannot access.