[Max6] Slower send/receive performance

Roth's icon

Last night I was revisiting Mattijs Kneppers' object oriented tools (https://cycling74.com/tools/kneppers-oo-objects/) for Max and noticed a possible bug—or at the very least an odd performance loss—in Max6.

In Mattijs' oo.method.maxhelp, if dive into the nested subpatches p more >p advanced >p speed you will see a test that compares the speed of an oo.method call to using send and receive. The send and receive example seemed much slower than it should have been, so I tried the same example in Max5 and noticed there was a big slow down in Max6.

Max Patch
Copy patch and select New From Clipboard in Max.

Here is an example patch that illustrates this performance hit:

Testing on a MacBook Pro 2.2GHz Core2Duo (T7500) 4MB RAM running OS X 10.6.7 in Max5 this example takes about 250ms and in Max 6 it takes about 320ms. I realize that this 28% slow down translates into only about 140ns per send/receive, but from what I know about new stuff in Max6 I can't think of why there would be a slow down here so thought it might be a bug.

AlexHarker's icon

I suspect it may be due to the fact that when running with SIAI on it is now possible for multiple top-level patchers to run schedulers in different threads. As send/receive are global then in order to operate correctly in this scenario they need to lock around the sending action, otherwise the operation is not threadsafe, and incorrect behaviour might result.

This is totally a guess (only the c74 guys know for sure), but if this is the reason, I'd say that it's a pretty minimal cost for threadsafety, and the consequences of not having threadsafety are far worse than than that performance hit.

I do sometimes (even as an efficiency freak) wonder why people get so into measuring max (rather than msp) performance in these synthetic ways, as I have rarely, if ever encountered a real world circumstance where the bottleneck of my patch was in the message domain. The audio domain however, is often an issue. This is a genuine question in my mind - not just me being difficult. Are you writing really huge patchers sending 1000s of messages every few seconds?

A.

GreaterThanZero's icon

Alex,

I've never tried to benchmark the messaging system, but I did (okay, I still do) have to refactor a project in Max For Live which relied on that for communication between patches.

What I was trying to accomplish at the time...

(much background; hang in there)

The monome arc4 has four encoders, and four LED rings (each consisting of 64 LEDs which can individually be set to one of 16 brightness levels). There are several messages you can pass into it's serial communication object to address LEDs. Sometimes it's more efficient to change the state of only one LED (as this is a shorter message), or to set a block of adjacent addresses to the same color (also a short efficient message), but the message that's proved most consistent in my applications broadcasts a 65 integer list (one to identify the ring, 64 to set individual brightness levels for every LED concurrently).

That all works great, in a single self-contained patch:
http://vimeo.com/channels/gtz#22548640

And here, you'll have to use your imagination a bit, but this patch:
http://vimeo.com/channels/gtz#22645870
is using the same values (driven by the encoder) to simultaneously control the LEDs and a [pfft~] object. The audio you're hearing in a simple noise signal, filtered. I was trying to change its audio source. I wanted to extend this into an audio plugin in Max For Live.

My design called one central patch to communicate with the arc, and up to four satellite patches that would process the audio stream from the track that they sit on. I arranged things so the central patch would broadcast encoder info as it came in, and the satellite patches would report back with LED states to display. Some of that overhead might have been avoidable with a different design, but I'd still have to broadcast huge arrays out to the filters, many times per second, and of course I wanted to keep the satellite patches modular so I could mix and match them with similar components.

Anyway, I've run some experiments broadcasting large amounts of data over udp, and that seems much more viable. (also, developments in the "arc reactor" and "pages" apps promise to make that all easier.) So I guess I'm not looking for a solution here. Just popping in with a practical example to clear up your question.

AlexHarker's icon

@GreaterThanZero - OK - I think understand your patch structure.

Two questions:

1 - Did you find send/receive to be too slow?

2 - Are the patches separate M4L devices? My recollection (perhaps incorrect) is that separate devices may not be able to communicate with each other at speed. I am not a M$L user so I might be wrong about this, but I believe that interdevice communication is not the same as communicating between patches in MaxMSP.

From the structure you explain I would be surprised (but again I may be wrong here) if MAxMSP send/receives were a realworld limiting factor in this situation.

It's probably worth saying also that I never really run in SIAI - in that case I can see CPU usage of max messages being a lot more crucial....

A.

GreaterThanZero's icon

1) I did. Probably should have mentioned that.

2) Ooh, good point. It is sending and receiving between several m4l devices, and yes, that is a very different response time to working in Max alone.

I haven't tried running similar experiments within a single m4l device. Would be interesting to help isolate the bottleneck. (Your sends still reach back to global space, and your receives still listen there. If local messaging bypasses all that, I'd expect separate devices to fall out of sync very quickly. I haven't encountered that, so there's a reasonable expectation that local messaging will run slowly in max for live)

Regardless, there could be hundreds of lengthy timing-critical messages per second. "Thousands" would be more of a stretch, but given the right devices to communicate with, I'm sure they'd build up.

Anyway... This isn't so intensive an example as you're looking for after all. Moving on. =)

Roth's icon

I suspect it may be due to the fact that when running with SIAI on it is now possible for multiple top-level patchers to run schedulers in different threads.

Oh, I forgot about the new multithreaded scheduler. Makes sense that they could be the culprit, but in my testing I had overdrive off (tried on also), scheduler in audio interrupt off, and mixer parallel processing off.

I do sometimes (even as an efficiency freak) wonder why people get so into measuring max (rather than msp) performance in these synthetic ways

I usually don't. Like you, I'm an efficiency freak concerned usually with the audio domain and usually when I use Shark on my patches (before I got into writing my DSP routines in C) it almost always was lots of *~ and line~ that would be my bottleneck.

It's probably worth saying also that I never really run in SIAI - in that case I can see CPU usage of max messages being a lot more crucial....

I seemed to remember having a problem with this one time a few years ago which maybe I could have fixed with faster messaging, but I didn't really *need* SIAI so I just turned it off (and later, made my DSP faster so all was cool).

Are you writing really huge patchers sending 1000s of messages every few seconds?

No, not really, this was just an observations I stumbled upon and thought I'd take a look at it further. What I have done is desgin a modular system for dynamically building huge patchers with most communication happening using pattr. pattr slowness was one of the reason I took a break from that project (was thinking about benchmarking some existing techniques and coming up with some other custom notification system if I had to). You weren't being difficult at all and I'm glad you asked this question because it made me check the efficiency of pattr and it looks like for me it has gotten about 20% faster on Max6 :)

Max Patch
Copy patch and select New From Clipboard in Max.

Check it out if anyone is so inclined:

broc's icon

there's a reasonable expectation that local messaging will run slowly in max for live

No, in M4L you can use local names by prepending "---", eg. [s ---mystuff] and [r ---mystuff].
With such names messaging runs basically like in Max alone.

shaunbarlow's icon

Hi all,
Just thought I'd bump this old thread and ask for wisdom on using send/receive to communicate lots of real time data between M4L devices.

Does anyone have advice on optimal methods of sending real time integers and lists between M4L devices?

I'm currently using global send/receives in a project that interfaces between the KMI SoftStep foot pedal with Ableton. A little background on the project - the SoftStep is a pedal with 10 buttons, each one has 4 pressure sensors - 1 in each corner. The "raw data" referred to below is a 5 integer list consisting of the button number followed by the current pressure on each of the sensors.

I have two devices (in a similar vein to GreaterThanZero's project referred to above). The devices are attached:

#1 is the central hub that takes the raw data from the pedal and sends it to a global send object. It then receives processed data from each of the satellite devices (#2), sending this out as midi CC messages to the OSX IAC driver which is then piping back into Ableton, being mapped to individual parameters.

#2 is the satellite device. There are approximately 20 instances being used in the Ableton set that I've built up for live performance (real time audio processing and looping). Each one receives the raw pedal data and parses it to generate specific parameters (e.g. foot on/off, live pressure, x/y pressure, etc.) .

Structuring the system this way seemed the most reasonable way of allowing a modular system to be built up where each time I wanted to control a new parameter from the foot pedal, it was as easy as dropping a #2 (pun apologies) next to the device that I wanted to control then mapping the midi output appropriately.

THE PROBLEM
The symptom I'm currently trying to find out how to get around is that there is a significant lag between what my foot is doing on the pedal and changes in the respective parameter in Live. For example, I have one key assigned so that the pressure along the x-axis controls a ping pong delay's bandwidth and the pressure along the y-axis controls the ping pong delay's center frequency. If I move my foot around on the pedal a lot then there basically seems to be a queue of commands that form because the messages can't get through quick enough. This makes it unusable in a live setting.

After all that, I come bearing two questions:
1) Is there a way to scale down the frequency with which the raw numbers are sent? I'm happy to lose some definition if it means that the messages are getting there with as little lag latency as possible.

2) Am I simply barking up the wrong tree by trying to use global send/receives between M4L devices for real time data such as this? If so, can you recommend another method?

Thanks all for reading this far. Any and all help gratefully appreciated.
Cheers,
S

broc's icon

Ad 1) You could try [speedlim] to reduce the frequency of sending numbers.

Ad 2) In my experience udpsend/udpreceive can handle higher data rates than global send/receive. Both methods introduce some unpredictable latency, but it can be minimized by setting Live's audio buffer as small as possible.

shaunbarlow's icon

Many thanks, Broc. I'll give both a try.

I'm starting to brainstorm on whether it's better to do all of the heavy parameter processing and midi sending (independent of Live) in a standalone max patch running in the max runtime and having the satellite (formerly #2) M4L devices simply sending UDP/midi/global-send-receive messages to open and close the proverbial organ stops at the right time.

I've been avoiding udpsend/receive up til now through lack of familiarity with it. I guess it's time I seized the day, jumped in the deep end and read the effing manual! Thanks for providing the impetus for me to do so :)

Cheers all,
S