MAXimum Speed

metaphorz

When teaching about the use of Max/Msp for supporting modeling and simulation
material, I find myself trying to inform students as to what hardware and software
are used for specific operations. In reading the list for a couple of months now,
I have some ideas, but the ideas may require correction. If anyone knows whether these
assumptions are correct, let me know. Clarifications are welcome.

1. For event handling (the Max mode of operation with its event scheduling), speed can be enhanced
with multiple-cores as threads can be allocated to cores. Having more cores, means that event
scheduling speed can be enhanced. Event handling seems the least concern when contrasted with
digital audio and video complexity and a need for speed optimizations. When other languages
are used (Javascript, Lua, Java), these could improve speed.

2. For MSP, I am unsure what optimizations (hardware-assisted or software) are being employed.
[gen~] or anything written as an external is using C-based compilation which is bound to be fast. The question though is whether specific hardware is being leveraged? Is either SIMD-style or pipeline-style parallelism employed for digital audio calculations? For our work here, the digital audio is a side effect and the primary purpose is solving ODEs: [gen~] works very well for this purpose at audio sample rates such as 44.1Khz. I am aware that Intel has SSE SIMD extensions. Are these used by [gen~] or is super-fast audio processing a result only of the efficiency of the
C compiler? Javascript and Lua presumably support Max operation but not MSP? The assumption is that their "code on the fly" is not fast enough for digital audio (this turns out, though, not to be true given the plethora of digital audio in javascript on the web).

3. Related to #2, is there specific hardware to support pipeline parallelism inherent in [gen~]?
I am referring to an operation like this: in [gen~], you have 3 objects X -> Y -> Z patched
as such -- X, Y, and Z can be executed simultaneously on the current sample by treating the patch as a pipeline.

4. As for the GPU, the parallelism here seems more self-evident in the forums and the Max
documentation: [jit.gl.X] uses the GPU (e.g., jit.gl.pix and the shader invocations). I am unsure
whether anyone has a patch that employs SIMD-style audio filter execution using a one dimensional
GPU buffer? Perhaps this is unnecessary given its lack of complexity when compared with 2D or 3D
parallelism required (or at least highly desired) for images and video.

In summary (assumptions):
A. Max -> multi-core helps, and C-based compilation is fastest
B. MSP -> C-based compilation (external,gen) but also pipeline and/or SIMD motherboard hardware?
C. Jitter -> GPU-based operations (SIMD)

Graham Wakefield

1. I'll let somebody more knowledgeable respond to this, but I believe that Max 7 will bring something interesting in that regard.

2. [gen~] generates lightweight C/C++ code which is compiled to native machine code by an embedded Clang/LLVM compiler. A lot of effort in gen has gone into making sure the generated code is easily optimized. We ensure to lift expressions up to block-rate wherever possible, and in some cases we also switch between different algorithms for generated operators, according to known information about the context.

We are currently investigating hardware-assisted SIMD operations for a future update, but this is not currently in place. AFAIK some MSP objects do have SIMD implementations. I'm probably not the best person to comment on it though. For thread-level task parallelism in MSP you can look at the multi-threading options for [poly~].

JS and Lua in Max do not currently support audio processing. Given recent advances in JIT engines for both, this is certainly feasible to explore. I have successfully used LuaJIT for real-time audio processing in a few non-Max projects (including some live coding!). LuaJIT is amazing: up to a point it can compete with C code, but only if you code carefully... and the more carefully you do, the more it echoes what you'd do in C. I've also used browser-based JS audio DSP. Things like asm.js look interesting, but it's all very recent development, and there are still limitations. I personally think that, in the current state of technology, JS/LuaJIT are best suited to situations involving many unexpected variations in control/data flow such as a web server, where any drops in performance aren't critical; whereas the case for audio processing is *usually* more constrained. An embedded compiler for gen~ makes more sense as algorithm changes occur only at edit-rate, and many assumptions can be made about the data streams being processed.

3. Not yet, but we're looking into it. It's tricky though: many typical audio algorithms don't lend themselves to parallelism.

4. I seem to remember that somebody did do something like this at least once, but it's not easy. Audio on GPU is usually limited by the slow data flow from the GPU back to the CPU, usually requiring a large latency (and buffer) to cover it. For your use case this isn't a problem, but even then I'm not convinced the overhead would be worth it.

metaphorz

@Graham: This helps considerably, especially for the computer science students taking my class
[the class is a mixture of CS and digital media/artists].