Gen~ efficiency comparison
Is it just me, or is gen~ require an incredibly higher cpu load to accomplish equivalent tasks using standard msp objects? I’ve attached a patch that uses a poly from a recent patch I’ve been working on. With poly instances set to 64, the no gen~ version uses 2-3% of my cpu, and the gen~ version uses 12-13%. I’m trying to optimize it, and I used gen~ with the thought that, since it is essentially compiled, it would be ‘more’ cpu efficient. I also used it for the fact that its objects all handle 64 bit floating point numbers, which is useful because I’m trying to process unique combinations of up to a set of 127 (i.e. 170,141,183,460,469,231,731,687,303,715,884,105,728). My thinking is that it is BECAUSE it processes everything with 64-bit precision that it requires so much more cpu power. Thoughts? Clarifications? Similar experiences? All comments welcome.
what version of max are you using? there was a bug in poly~ relating to some of these issues in general, which i think is fixed now (6.0.5). also, send~ receive~ is not supported in poly~ at the moment since max 6 i think.
in every gen~ patch i have ever made it has used considerably less cpu than an msp equivalent, depending on process of course. gen is definitely brilliant in terms of performance for me.
but i do not really know what i am talking about. maybe you have uncovered some other bug? i get the bad results with your patch, yes.
I’m using Max 6.0.5. What do you mean, send~ and receive~ aren’t supported in poly~? They have worked as they usually do so far, so that hasn’t been the issue, as far as I can tell. The only thing I can think of is that maybe a specific object in gen~ takes more power than it’s msp equivalent, whereas most objects take less power. My best guess is trunc. This is actually my first venture into using gen~, so I haven’t experienced this improved performance you mention. Would you mind demonstrating a situation in which the performance is better?
hi. sorry maybe i shouldn’t post late at night half asleep.
anyway, this IS strange indeed. looking at your examples again and testing some edits, it appears that [trunc] IS to blame. however, i would suggest that the difference is so huge (6 – 7 %) i would consider it a bug. you should report it.
in the meantime, the example patch on performance found here:
is more informative than any example i could show you.
Thanks! That’s exactly what I’m looking for. Although even in the performance patch there are inconsistent results. With the patch running (at 44100 sample rate, 256 signal vector size) the perf_gen patch, my cpu hangs around 8%, making sudden leaps to around 17% occasionally. The msp equivalent, perf_msp, stays around 19% with leaps to 24%. So gen’s the winner, right? Not so much, because perf_gen_biquads stays around 14% with leaps to 24%, and perf_msp_biquads chills out at 11%, with leaps to 15%. Sooooo… Not really sure what’s going on here. Perf_gen uses history, multiplication, and addition objects exclusively, so I suppose those perform better than msp’s equivalents (history = delay~). Perf_gen_biquads uses history, multiplication, subtraction, and pass (and param), while the other one just uses biquads.
I’m thinking it largely comes down to type and quantity of objects. With biquad comparison, the gen version just has more objects in general, even though the processing isn’t drastically different, whereas the other patches have almost exactly if not exactly the same amount of objects. I’m wondering if they were meaning to point that out with that patch.
And yes, I think I will report the trunc situation as a bug, because even though it does say in the help patch for trunc~ that the operation is computationally expensive, a 6 to 7% difference using gen~ just seems silly.
Ooooook. I just finished an analysis of the cpu usage in a patch I’m making, and I thought I’d share my results. I only analyzed the audio objects that are actually in the patch (and a few others because I was curious), so there are a lot more that I haven’t touched. I’ve included the patch I used for the analysis in case anybody else wants to use it to find the relative cpu usage of an object. (Sample rate = 44100, Signal vector size = 256, Computer specs = Macbook Pro – OS X 10.7.3 – 8 GB ram – 2.5 GHz Intel Core i7)
The way I did the analysis was to make 1000 instances of an object, feed them all a constant value of 1 in an object-appropriate way, and then record the cpu utilization in a histogram for 10 seconds, at 100 millisecond intervals. From there, I extrapolated the range of cpu usage, the percentage it spent the most time on, and the per-instance cpu usage. These are listed in the format:
objectName minCPU-maxCPU, mostFrequentCPU@ratioOfOccurance | CPUperInstance
You can get an idea of how evenly the cpu values were distributed (aka how frequently it changed) by looking at the ratio of occurance, i.e. a ratio of 100/100 means it stayed on most frequent cpu value the whole time, but if it’s 20/100, that means the longest cumulative time of any one particular value was 2 seconds; very distributed.
Anyway, enough explanation. Here you go!
buffer~ 0, 0@100/100 | 0.
sig~ 2-5, 3@77/100 | 0.003
+~ 3-5, 4@70/100 | 0.004
*~ 3-8, 4@46/100 | 0.004
%~ 4-9, 4@40/100 | 0.004
trunc~ 4-11, 5@60/100 | 0.005
>~ 5-9, 6@53/100 | 0.006
selector~ 5-12, 6@73/100 | 0.006
gen~ 5-15, 6@21/100 | 0.006
phasor~ 7-12, 7@62/100 | 0.007
index~ 8-13, 8@74/100 | 0.008
poke~ 11-17, 12@66/100 | 0.012
gate~ 16-27, 17@52/100 | 0.017
count~ 18-27, 19@86/100 | 0.019
pow~ 24-35, 25@72/100 | 0.025
—-GEN~ OBJECTS(gen~cost included)—-
nothing 5-15, 6@21/100 | 0.006
const 5-15, 7@21/100 | 0.007
* 4-14, 7@22/100 | 0.007
+ 5-14, 7@25/100 | 0.007
gate 5-15, 7@25/100 | 0.007
selector 6-15, 8@25/100 | 0.008
trunc 11-32, 12@27/100 | 0.012
peek 13-30, 14@29/100 | 0.014
poke 30-34, 31@49/100 | 0.031
pow 42-66, 44@39/100 | 0.044
exp2 50-68, 52@46/100 | 0.052
% 68-98, 70@41/100 | 0.070
—-GEN~ OBJECTS(gen~cost subtracted)—-
nothing 0-0, 0@100/100 | *0.
const 0-0, 0@100/100 | *0.
* 0-0, 0@100/100 | *0.
+ 0-0, 0@100/100 | *0.
gate 0-0, 0@100/100 | *0.
selector 0-0, 0@100/100 | *0.
trunc 6-17, 6@27/100 | 0.006
peek 8-15, 8@29/100 | 0.008
poke 25, 25@100/100 | 0.025
pow 37-51, 38@39/100 | 0.038
exp2 45-53,46@46/100 | 0.046
% 63-83, 64@41/100 | 0.064
* may just be 0 because I didn’t do the subtraction of gen~’s cost properly
Based on the above analysis, it would seem that the most cost-effective way to patch is to use one gen~ object, and if you can accomplish what you want to accomplish with just the *, +, gate, and selector objects (other unexplored simple arithmetic objects excluded), then go for it. And if you at all can, stay away from the pow, exp2, and % objects, as these are THE MOST EXPENSIVE objects analyzed. True, this may be because they are 64-bit, but man. As mentioned in previous posts, I was running into cpu problems in my patch, and that would explain why. I think I use two gen objects with one pow, one trunc, and one % object each, and that shoots my cpu right up to about 2 percent, which is not good for an abstraction you want to run multiple instances of.
I hope someone finds this information useful, and again, if anybody else wants to do some analysis of other objects with my patch and share, feel free.
Gen *can* result in better performance than using MSP objects, but this can’t be guaranteed in all cases. In general, Gen patchers start to show better performance compared to MSP patching as the number of objects and connections increases. Not only does this amortize the unavoidable overhead of the gen~ object itself, it also allows for much greater compiler optimization within the patch (which simply isn’t possible for MSP patchers).
I have attached a performance testing patch which creates 20 copies of a particular operator, and connects them up randomly. It then generates both MSP and gen~ versions, and alternately hosts them in a poly~ with 32 voices.
The results clearly show that for many operators the difference is quite significant even with only 20 boxes, but some operators have similar performance and a few (here trunc and mod) are worse.
We are focusing efforts to improve these particular operators for the 6.0.7 release, however over the last few months we have been making major changes to the internals of Gen, and have been balancing the importance of performance improvements with the addition of major user-requested features.
: nothing.maxpat: avg 0. max 1.
: test_empty_gen.maxpat: avg 0.8 max 1.
: test_empty_msp.maxpat: avg 0.1 max 1.
: test_add_gen.maxpat: avg 2. max 4.
: test_add_msp.maxpat: avg 6. max 10.
: test_cos_gen.maxpat: avg 25.3 max 37.
: test_cos_msp.maxpat: avg 50. max 67.
: test_delay_gen.maxpat: avg 22.1 max 31.
: test_delay_msp.maxpat: avg 51.1 max 69.
: test_delta_gen.maxpat: avg 4.1 max 8.
: test_delta_msp.maxpat: avg 7.3 max 12.
: test_div2_gen.maxpat: avg 2. max 3.
: test_div2_msp.maxpat: avg 6.1 max 10.
: test_div_gen.maxpat: avg 21.6 max 33.
: test_div_msp.maxpat: avg 23.9 max 38.
: test_gate_gen.maxpat: avg 4.2 max 7.
: test_gate_msp.maxpat: avg 17.2 max 34.
: test_log_gen.maxpat: avg 30.3 max 43.
: test_log_msp.maxpat: avg 60.4 max 79.
: test_maximum_gen.maxpat: avg 1.9 max 5.
: test_maximum_msp.maxpat: avg 6.4 max 12.
: test_mod_gen.maxpat: avg 61.1 max 95.
: test_mod_msp.maxpat: avg 15.5 max 30.
: test_mul_gen.maxpat: avg 2.1 max 4.
: test_mul_msp.maxpat: avg 6.4 max 12.
: test_noise_gen.maxpat: avg 6.5 max 11.
: test_noise_msp.maxpat: avg 10.3 max 21.
: test_phasewrap_gen.maxpat: avg 12.5 max 19.
: test_phasewrap_msp.maxpat: avg 13.8 max 27.
: test_poltocar_gen.maxpat: avg 21.7 max 32.
: test_poltocar_msp.maxpat: avg 100.5 max 100.
: test_sqrt_gen.maxpat: avg 3.2 max 6.
: test_sqrt_msp.maxpat: avg 21.7 max 37.
: test_trunc_gen.maxpat: avg 17.9 max 26.
: test_trunc_msp.maxpat: avg 7.2 max 14.
Tested with Max 6.0.5 (1741622)
Mac OS 10.6.8
2 GHz Intel Core i7, 4GB 1333 MHz RAM.
That’s very helpful, thank you. I installed 6.05, and I’m seeking the reference on the API. I can only find tutorials. Is there a reference, or do I just figure it out from the tutorials?
Links to the Gen reference pages should be there in the ‘?’ tab of the gen~ help file, and also under the ‘vignettes’ tab of the documentation sidebar – are you not seeing them there?
Here in Sacramneto, It’s too hot to turn on my i7 and opengl accelrator workstation at the momet.
I have only one Max license. I think it’s only legal for me to onstall on one machine. Is there a way I could upgrade so I could run it on my laptop too? Thank you for following so late on a Friday evening )
HI, I’m up) I find good information in the gen~ help *patcher* and the Gen Common Operators Reference *vignette.* And the examples are much more helpful since the last time I updated. Thank you )
Additionally the recipes on the cycling74 website are fun. It just takes some time to learn it all. There’s so much )
jfyi, the MSP objects also do it all in 64 bit resolution, so that would not make a difference.