How to get beyond 50% CPU useage for Jitter

keith@beamfoundation.'s icon

Hi,

I'm running Jitter 1.42 and Max 4.57 on a dual core AMD & Win XP.

So I have a couple of QT movies playing at 15 fps (encoded photo JPEG at 15fps 320 x 240) and some processing and the two cores are running at about 50% for Max and 49% System Idle.

Now when I add more video processing to the exact same movies my frame rate drops to 4 fps but my CPU usage remains the same Max: 50% and System Idle: 49%.

What limits Max/Jitter from using more of the CPU? Any way to open this up to say 70%. MSP is turned off (it's a video only machine) and I have mucked with scheduler settings to no avail.

Thanks,

Keith

Keith McMillen
BEAM Foundation
http://www.beamfoundation.org/
510.502.5310

Joshua Kit Clayton's icon

On Aug 21, 2006, at 5:40 PM, Keith McMillen wrote:

> What limits Max/Jitter from using more of the CPU? Any way to open
> this up to say 70%. MSP is turned off (it's a video only machine)
> and I have mucked with scheduler settings to no avail.

It really depends on where the bottlenecks in your patch are, as to
how much advantage there will be to having multiple cores. There's a
few things to consider:

1. It could be the movie decompression. QT on windows doesn't
necessarily use multiple threads for decompression.

2. Depending on the jitter objects in use, they may not use multiple
threads. For example, almost all pointwise operators will use
multiple threads if there are multiple cores (pointwise operators
include things like jit.charmap, jit.clip, jit.op, jit.xfade, etc.).
Most spatial operators (with the exception of jit.convolve, have not
been made multiprocessor aware).

3. It could be memory bandwidth or resources which is the bottleneck.
(sometimes multi-core processors do not offer any advantages if they
are consistently contending over the same region of memory)

4. It might be related to VBL synchronization (try disabling
jit.window @sync).

The only thing you can do to improve your patch without changing it
would be to disable @sync. If you can change your patch, you might be
able to speed things up if you disable spatial objects like jit.rota
or jit.repos which are currenlty single threaded, and perhaps your
bottleneck.

In general I would move all expensive processing to the GPU where
possible if you really want high performance (2-16x speedups), but
you might be able to make better advantage of your dual core CPU by
running two copies of MaxMSP (e.g. use the runtime side by side with
the std app), where one patch communicates to the other patch via
jit.net.send/recv on localhost. Since this is a true separate
process, it can be dispatched by the OS to the second core.

Hope this helps.

-Joshua

d17e's icon

Hi Keith,

I've doing something similar also on a windows machine (Intel dual core
though).
Anyhoo i think u might profit from downloading Jitter 1.5, since in the
'whats new' document (found on the cycling site >
https://cycling74.com/download/jitter15doc.zip ), there's stated that from as
Jitter 1.5 there's full support for multiprocessoring system.
I get reasonalbe performance and definitely don't get stuck at 50% of my
systems maximum...

check it out! think that could do the trick. ( download link:
https://cycling74.com/downloads/jitter )

greets
david

On 8/22/06, Keith McMillen wrote:
>
> Hi,
>
> I'm running Jitter 1.42 and Max 4.57 on a dual core AMD & Win XP.
>
> So I have a couple of QT movies playing at 15 fps (encoded photo JPEG at
> 15fps 320 x 240) and some processing and the two cores are running at about
> 50% for Max and 49% System Idle.
>
> Now when I add more video processing to the exact same movies my frame
> rate drops to 4 fps but my CPU usage remains the same Max: 50% and System
> Idle: 49%.
>
> What limits Max/Jitter from using more of the CPU? Any way to open this up
> to say 70%. MSP is turned off (it's a video only machine) and I have mucked
> with scheduler settings to no avail.
>
> Thanks,
>
> Keith
>
>
> Keith McMillen
> BEAM Foundation
> http://www.beamfoundation.org/
> 510.502.5310
>
>
>
>

Graham Wakefield's icon

And also try changing the Max Options > Performance Options, such as
slop, poll & low priority throttles, etc; it made a suprisingly big
difference to the max CPU usage on my setup.

On Aug 22, 2006, at 12:56 AM, david vandenbogaerde wrote:

> Hi Keith,
>
> I've doing something similar also on a windows machine (Intel dual
> core though).
> Anyhoo i think u might profit from downloading Jitter 1.5, since in
> the 'whats new' document (found on the cycling site > http://
> cycling74.com/download/jitter15doc.zip ), there's stated that from
> as Jitter 1.5 there's full support for multiprocessoring system.
> I get reasonalbe performance and definitely don't get stuck at 50%
> of my systems maximum...
>
> check it out! think that could do the trick. ( download link:
> https://cycling74.com/downloads/jitter )
>
> greets
> david
>
> On 8/22/06, Keith McMillen wrote:
> Hi,
>
> I'm running Jitter 1.42 and Max 4.57 on a dual core AMD & Win XP.
>
> So I have a couple of QT movies playing at 15 fps (encoded photo
> JPEG at 15fps 320 x 240) and some processing and the two cores are
> running at about 50% for Max and 49% System Idle.
>
> Now when I add more video processing to the exact same movies my
> frame rate drops to 4 fps but my CPU usage remains the same Max:
> 50% and System Idle: 49%.
>
> What limits Max/Jitter from using more of the CPU? Any way to open
> this up to say 70%. MSP is turned off (it's a video only machine)
> and I have mucked with scheduler settings to no avail.
>
> Thanks,
>
> Keith
>
>
> Keith McMillen
> BEAM Foundation
> http://www.beamfoundation.org/
> 510.502.5310
>
>
>
>

Joshua Kit Clayton's icon

On Aug 22, 2006, at 12:56 AM, david vandenbogaerde wrote:

> I've doing something similar also on a windows machine (Intel dual
> core though).
> Anyhoo i think u might profit from downloading Jitter 1.5, since in
> the 'whats new' document (found on the cycling site > http://
> cycling74.com/download/jitter15doc.zip ), there's stated that from
> as Jitter 1.5 there's full support for multiprocessoring system.
> I get reasonalbe performance and definitely don't get stuck at 50%
> of my systems maximum...
>
> check it out! think that could do the trick. ( download link:
> https://cycling74.com/downloads/jitter )

Oops. I didn't notice that you were running 1.2.4 Keith. You
definitely need Jitter 1.5 for any MP benefit. However, I'd like to
clarify that there's not exactly "full support for multiprocessing
system", only pointwise operators as mentioned previously will use
multiple threads (in the future perhaps more spatial operators than
jit.convolve will exploit multiple core machines).

-Joshua

keith@beamfoundation.'s icon

Am already running Jitter 1.52. (Typos am us - not running 1.42 or 1.24)

I can see that the processor load is equally shared between the two cores, both at about 50%. I'm not adding any more QT players just some more intensive processing of the same matrices. The fps drops (in all patches) but Jitter is not using any more CPU cycles and the rest of the 49% are sitting there idle doing me about as much good as an elected official in December. Hmmmm.

"And also try changing the Max Options > Performance Options, such as slop, poll & low priority throttles, etc; it made a suprisingly big difference to the max CPU usage on my setup."

Any recomended settings? I've already Monte Carlo'ed the page with out much luck.

Other suggestions welcome. thx,

Keith
Joshua Kit Clayton wrote:
On Aug 22, 2006, at 12:56 AM, david vandenbogaerde wrote:

> I've doing something similar also on a windows machine (Intel dual
> core though).
> Anyhoo i think u might profit from downloading Jitter 1.5, since in
> the 'whats new' document (found on the cycling site > http://
> cycling74.com/download/jitter15doc.zip ), there's stated that from
> as Jitter 1.5 there's full support for multiprocessoring system.
> I get reasonalbe performance and definitely don't get stuck at 50%
> of my systems maximum...
>
> check it out! think that could do the trick. ( download link:
> https://cycling74.com/downloads/jitter )

Oops. I didn't notice that you were running 1.2.4 Keith. You
definitely need Jitter 1.5 for any MP benefit. However, I'd like to
clarify that there's not exactly "full support for multiprocessing
system", only pointwise operators as mentioned previously will use
multiple threads (in the future perhaps more spatial operators than
jit.convolve will exploit multiple core machines).

-Joshua

Keith McMillen
BEAM Foundation
http://www.beamfoundation.org/
510.502.5310

Joshua Kit Clayton's icon

On Aug 22, 2006, at 12:26 PM, Keith McMillen wrote:

> I can see that the processor load is equally shared between the two
> cores, both at about 50%. I'm not adding any more QT players just
> some more intensive processing of the same matrices. The fps drops
> (in all patches) but Jitter is not using any more CPU cycles and
> the rest of the 49% are sitting there idle doing me about as much
> good as an elected official in December. Hmmmm.

I believe it's safe to say that this is one thread running
alternatingly on two processors (50% o the time on one and 50% of the
time on another). So I think it's safe to say that single threaded
objects must be your bottle neck. You won't be able to get around
this without re-architecting yuor patch somehow (as mentioned
previously, running two copies of MaxMSP is probably a good approach
to take. in general, spawning multiple processes is the typical way
to exploit the processing power of multi-core architectures in the
absence of a multi-threaded application which is capable of doing
similar on a thread level.)

> "And also try changing the Max Options > Performance Options, such
> as slop, poll & low priority throttles, etc; it made a suprisingly
> big difference to the max CPU usage on my setup."
>
> Any recomended settings? I've already Monte Carlo'ed the page with
> out much luck.

Increasing the queuethrottle is usually what gives people better
performance. The queue throttle corresponds to how many UI elements,
qmetros, etc. are fired per low priority queue servicing. However if
you don't have a lot of UI objects or (q)metros, then you likely
won't see any benefit.

Of course, I'd lobby again for focusing your development on GPU based
processing rather than CPU based processing, including the UYVY -
>RGBA conversion on the graphics card per previous threads. If you
rearchitect your patch to use the GPU, you should be able to process
*much* higher resolutions at >30fps.

-Joshua

Joshua Kit Clayton's icon

So, offlist, I discovered that one of Keith's bottlenecks was using a
modified version of the jit.gl.render-grid.pat example patch. There's
a few things I'd like to share with the list regarding this patch.
First off it's an old patch (Jitter 1.0), and there's some better
ways to do it now. Secondly, there are some things in the patch which
can be sped up. I've included my comments to Keith and an example
patch using newer techniques (jit.gl.texture, jit.gl.mesh, jit.matrix
exprfill), together with some optimizations. OpenGL rendering in
Jitter is single threaded, so you're not going to get any benefit
from Multicore CPUs there...

1. the first thing in the jit.gl.render-grid.pat patch is that if I
run at a high resolution (like 256x256 vertices), I can get a
noticeable speedup by eliminating all the unnecessary jit.pwindows in
the patch which need to do float->char conversion, and/or downsampling.

2. I also got another 10-15% by disabling the interpolation for the
initial jit.matrix, and if you don't need to convert the green color
channel to alpha, you can save some CPU, by eliminating this stage
entirely.

3. A cheap RGB-> luminance conversion is to just grab the green color
channel (green is 70% of luminance data)

4. You can save a few more cycles by eliminating the jit.op @op * for
z displacement, and instead rely on the ob3d scale method

5. using @unique 1 to jit.qt.movie you'll use less CPU for redundant
frames (only an issue if your movie framerate is less than the patch
framerate).

6. using jit.gl.mesh, you may get even better performance.

-Joshua

Max Patch
Copy patch and select New From Clipboard in Max.

Mattijs's icon

> Now when I add more video processing to the exact same movies my frame rate drops to 4 fps but my CPU usage remains the same Max: 50% and System Idle: 49%.

This sounds very much like a problem I had. The clue there was that jitter didn't support multiple processors for the operations I used, but the OS tried to balance cpu load by running the same process -alternately- on two cpus. Your cpu meter averages the cpu load and reports 50% on both cpu's where actually the cpu switches from 100% on the first and 0% on the second to 0% on the first and 100% on the second very rapidly.

Hope that helps,
Mattijs

keith@beamfoundation.'s icon

Thanks Josh,

These changes did speed up the patches performance and when combined with my other Jit patches (needed for the rest of the localization, reactive processing, etc...) I have doubled total frame rate from 4 fps to 8-9 fps. CPU still sits at 50%. Interstingly, the variation of the jit.gl.render-grid.pat you sent, with none of the other patches running is 50 fps (movie rate) at 17% of both cores of the CPU as claimed by XP. Very fast. It just grinds to a halt with the rest of my patch loaded.

Discussions of these observations with Adrian Freed and John Lazzaro make us suspect a cache problem. The mem needed for the renedering combined with other processes ( there are many intermediate matrices before and after the gl.render ) exceeds the cache at some point and this may be the bottleneck keeping the CPU from exceeding 50%. So running 2 Max/Jitter instances using the same cache, according to this theory, would not help. More tests needed to verify.

So this again makes me want to run the equivalent process (exemplified in the jit.gl.render-grid.pat) on the GPU. Can someone point me to the nearest shader that can get us started?

Thanks to all,

Keith

Joshua Kit Clayton wrote:

So, offlist, I discovered that one of Keith's bottlenecks was using a
modified version of the jit.gl.render-grid.pat example patch. There's
a few things I'd like to share with the list regarding this patch.
First off it's an old patch (Jitter 1.0), and there's some better
ways to do it now. Secondly, there are some things in the patch which
can be sped up. I've included my comments to Keith and an example
patch using newer techniques (jit.gl.texture, jit.gl.mesh, jit.matrix
exprfill), together with some optimizations. OpenGL rendering in
Jitter is single threaded, so you're not going to get any benefit
from Multicore CPUs there...

1. the first thing in the jit.gl.render-grid.pat patch is that if I
run at a high resolution (like 256x256 vertices), I can get a
noticeable speedup by eliminating all the unnecessary jit.pwindows in
the patch which need to do float->char conversion, and/or downsampling.

2. I also got another 10-15% by disabling the interpolation for the
initial jit.matrix, and if you don't need to convert the green color
channel to alpha, you can save some CPU, by eliminating this stage
entirely.

3. A cheap RGB-> luminance conversion is to just grab the green color
channel (green is 70% of luminance data)

4. You can save a few more cycles by eliminating the jit.op @op * for
z displacement, and instead rely on the ob3d scale method

5. using @unique 1 to jit.qt.movie you'll use less CPU for redundant
frames (only an issue if your movie framerate is less than the patch
framerate).

6. using jit.gl.mesh, you may get even better performance.

-Joshua

Max Patch
Copy patch and select New From Clipboard in Max.

Keith McMillen
BEAM Foundation
http://www.beamfoundation.org/
510.502.5310

yacine's icon

as it was suggested before, you could run a part of your process in runtime and the other in max.
mxj net.maxhole makes the link quite easy between them on the same computer when only messages
are concerned.
other solutions are possible for what concerns jitter matrix transfer if you need that.

//yac

> Thanks Josh,
>
> These changes did speed up the patches performance and when combined with my other Jit patches
> (needed for the rest of the localization, reactive processing, etc...) I have doubled total
> frame rate from 4 fps to 8-9 fps. CPU still sits at 50%. Interstingly, the variation of the
> jit.gl.render-grid.pat you sent, with none of the other patches running is 50 fps (movie rate)
> at 17% of both cores of the CPU as claimed by XP. Very fast. It just grinds to a halt with the
> rest of my patch loaded.
>
> Discussions of these observations with Adrian Freed and John Lazzaro make us suspect a cache
> problem. The mem needed for the renedering combined with other processes ( there are many
> intermediate matrices before and after the gl.render ) exceeds the cache at some point and this
> may be the bottleneck keeping the CPU from exceeding 50%. So running 2 Max/Jitter instances
> using the same cache, according to this theory, would not help. More tests needed to verify.
>
> So this again makes me want to run the equivalent process (exemplified in the
> jit.gl.render-grid.pat) on the GPU. Can someone point me to the nearest shader that can get us
> started?
>
> Thanks to all,
>
> Keith
>
>
> Joshua Kit Clayton wrote:
>
>
> So, offlist, I discovered that one of Keith's bottlenecks was using a
> modified version of the jit.gl.render-grid.pat example patch. There's
> a few things I'd like to share with the list regarding this patch.
> First off it's an old patch (Jitter 1.0), and there's some better
> ways to do it now. Secondly, there are some things in the patch which
> can be sped up. I've included my comments to Keith and an example
> patch using newer techniques (jit.gl.texture, jit.gl.mesh, jit.matrix
> exprfill), together with some optimizations. OpenGL rendering in
> Jitter is single threaded, so you're not going to get any benefit
> from Multicore CPUs there...
>
> 1. the first thing in the jit.gl.render-grid.pat patch is that if I
> run at a high resolution (like 256x256 vertices), I can get a
> noticeable speedup by eliminating all the unnecessary jit.pwindows in
> the patch which need to do float->char conversion, and/or downsampling.
>
> 2. I also got another 10-15% by disabling the interpolation for the
> initial jit.matrix, and if you don't need to convert the green color
> channel to alpha, you can save some CPU, by eliminating this stage
> entirely.
>
> 3. A cheap RGB-> luminance conversion is to just grab the green color
> channel (green is 70% of luminance data)
>
> 4. You can save a few more cycles by eliminating the jit.op @op * for
> z displacement, and instead rely on the ob3d scale method
>
> 5. using @unique 1 to jit.qt.movie you'll use less CPU for redundant
> frames (only an issue if your movie framerate is less than the patch
> framerate).
>
> 6. using jit.gl.mesh, you may get even better performance.
>
> -Joshua
>
>
> #P toggle 390 366 15 0;
> #P window setfont "Sans Serif" 9.;
> #P window linecount 1;
> #P message 390 389 88 196617 poly_mode $1 $1;
> #P toggle 765 100 15 0;
> #P message 765 128 44 196617 fsaa $1;
> #P toggle 706 100 15 0;
> #P message 706 128 46 196617 sync $1;
> #P hidden newex 309 335 75 196617 loadmess 0.25;
> #P window linecount 2;
> #P comment 387 108 245 196617 use jit.gl.texture for rectangular
> texture dimensions (not limited to power of two);
> #P window linecount 3;
> #P comment 597 55 347 196617 use unique to prevent redundant frames
> , and hence redundant rendering (should also turn down qmetro ,
> just leaving high to demonstrate performance);
> #P toggle 535 27 15 0;
> #P window linecount 1;
> #P message 535 55 53 196617 unique $1;
> #P hidden newex 445 239 48 196617 loadbang;
> #P window linecount 2;
> #P message 577 273 144 196617 exprfill 0 "norm[0]" , exprfill 1 "1.-
> norm[1]" , bang;
> #P window linecount 1;
> #P newex 577 305 190 196617 jit.matrix texcoords 2 float32 320 240;
> #P window linecount 2;
> #P message 398 273 112 196617 exprfill 0 "snorm[0]" , exprfill 1
> "snorm[1]" ,;
> #P window linecount 1;
> #P newex 398 305 168 196617 jit.matrix geom 3 float32 320 240;
> #P flonum 310 361 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
> #P flonum 267 361 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
> #P flonum 225 361 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
> #P newex 196 385 99 196617 pak scale 1. 1. 0.25;
> #P window linecount 2;
> #P comment 27 419 140 196617 use jit.gl.mesh and scale Z with the
> scale attribute;
> #P window linecount 1;
> #P newex 370 139 56 196617 t b l erase;
> #P newex 393 209 192 196617 jit.gl.texture render_grid @name mytex;
> #P newex 171 438 379 196617 jit.gl.mesh render_grid @draw_mode
> tri_grid @texture mytex @color 1. 1. 1. 1.;
> #P flonum 213 29 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
> #P user jit.fpsgui 57 113 60 196617 0;
> #P hidden message 415 51 103 196617 read multimeter.mov;
> #P newex 172 276 50 196617 t b;
> #P hidden message 399 29 14 196617 1;
> #N vpatcher 642 520 1118 872;
> #P inlet 230 84 15 0;
> #P toggle 352 214 15 0;
> #P window setfont "Sans Serif" 9.;
> #P message 352 233 75 196617 auto_rotate $1;
> #P message 315 233 32 196617 reset;
> #P newex 103 101 27 196617 t i i;
> #P flonum 261 213 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
> #P message 261 233 51 196617 radius $1;
> #P flonum 194 213 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
> #P message 194 233 60 196617 tracking $1;
> #P newex 10 254 355 196617 jit.gl.handle render_grid
> @inherit_transform 1 @depth_enable 1 @tracking 8;
> #P outlet 10 284 15 0;
> #P newex 103 56 50 196617 select 27;
> #P newex 103 34 40 196617 key;
> #P newex 120 146 91 196617 prepend fullscreen;
> #P newex 120 167 189 196617 jit.window render_grid @rect 10 50 200
> 200 @depthbuffer 1;
> #P comment 10 218 178 196617 inherit_transform is important here ,
> since we are controlling jit.gl.render;
> #P toggle 103 81 15 0;
> #P fasten 14 0 7 0 357 251 15 251;
> #P fasten 10 0 7 0 266 251 15 251;
> #P fasten 8 0 7 0 199 248 15 248;
> #P fasten 13 0 7 0 320 251 15 251;
> #P connect 7 0 6 0;
> #P connect 4 0 5 0;
> #P connect 5 0 0 0;
> #P connect 0 0 12 0;
> #P connect 12 1 3 0;
> #P connect 16 0 2 0;
> #P connect 3 0 2 0;
> #P connect 9 0 8 0;
> #P connect 11 0 10 0;
> #P connect 15 0 14 0;
> #P pop;
> #P newobj 706 155 115 196617 p window-mouse-rotate;
> #P message 361 49 42 196617 rate $1;
> #P flonum 361 29 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
> #P newex 172 305 211 196617 jit.pack 3 float32 320 240 @out_name geom;
> #P newex 172 210 203 196617 jit.matrix 1 float32 320 240 @planemap 2;
> #P message 235 49 28 196617 read;
> #P message 305 49 27 196617 stop;
> #P message 271 49 31 196617 start;
> #P toggle 172 28 15 0;
> #P newex 172 48 51 196617 qmetro 2;
> #P newex 172 79 153 196617 jit.qt.movie 320 240 @unique 1;
> #P newex 623 193 120 196617 jit.gl.render render_grid;
> #P comment 34 213 100 196617 grab green for luma;
> #P comment 508 242 349 196617 using new exprfill method to generate
> geometry and texture coordinates;
> #P window linecount 5;
> #P comment 828 111 100 196617 disable VBL sync for max framerate and
> benchmarking. turn on FSAA for full scene anti aliasing;
> #P connect 5 0 18 0;
> #P fasten 42 0 20 0 395 419 176 419;
> #P fasten 24 0 20 0 201 413 176 413;
> #P connect 11 0 20 0;
> #P connect 6 0 5 0;
> #P fasten 5 0 4 0 177 75 177 75;
> #P fasten 33 0 4 0 540 74 177 74;
> #P fasten 9 0 4 0 240 75 177 75;
> #P fasten 8 0 4 0 310 75 177 75;
> #P fasten 7 0 4 0 276 75 177 75;
> #P fasten 13 0 4 0 366 75 177 75;
> #P hidden fasten 17 0 4 0 420 72 177 72;
> #P connect 22 1 10 0;
> #P connect 10 0 16 0;
> #P connect 16 0 11 0;
> #P connect 19 0 5 1;
> #P fasten 30 0 20 1 582 426 222 426;
> #P connect 25 0 24 1;
> #P connect 26 0 24 2;
> #P connect 27 0 24 3;
> #P hidden connect 37 0 27 0;
> #P hidden connect 15 0 12 0;
> #P connect 12 0 13 0;
> #P fasten 4 0 22 0 177 102 375 102;
> #P connect 10 0 11 2;
> #P connect 43 0 42 0;
> #P connect 22 1 21 0;
> #P hidden connect 32 0 29 0;
> #P connect 29 0 28 0;
> #P connect 34 0 33 0;
> #P hidden connect 32 0 31 0;
> #P connect 31 0 30 0;
> #P fasten 22 0 3 0 375 174 628 174;
> #P fasten 22 2 3 0 421 168 628 168;
> #P fasten 14 0 3 0 711 177 628 177;
> #P connect 39 0 38 0;
> #P connect 40 0 14 0;
> #P connect 38 0 14 0;
> #P connect 41 0 40 0;
> #P window clipboard copycount 44;
>
>
>
>
>
> Keith McMillen
> BEAM Foundation
> http://www.beamfoundation.org/
>

keith@beamfoundation.'s icon

Understood, and I have used this method before splitting MSP out to a seperate CPU. But if cache flushing is the limit her, 2 instances will not help. Must run the experiment for real data altho idle speculation is so much easier.

But I still have this $600 video card doing nothing but depreciating...

KMc

Yacine Sebti wrote:
as it was suggested before, you could run a part of your process in runtime and the other in max.
mxj net.maxhole makes the link quite easy between them on the same computer when only messages
are concerned.
other solutions are possible for what concerns jitter matrix transfer if you need that.

//yac

> Thanks Josh,
>
> These changes did speed up the patches performance and when combined with my other Jit patches
> (needed for the rest of the localization, reactive processing, etc...) I have doubled total
> frame rate from 4 fps to 8-9 fps. CPU still sits at 50%. Interstingly, the variation of the
> jit.gl.render-grid.pat you sent, with none of the other patches running is 50 fps (movie rate)
> at 17% of both cores of the CPU as claimed by XP. Very fast. It just grinds to a halt with the
> rest of my patch loaded.
>
> Discussions of these observations with Adrian Freed and John Lazzaro make us suspect a cache
> problem. The mem needed for the renedering combined with other processes ( there are many
> intermediate matrices before and after the gl.render ) exceeds the cache at some point and this
> may be the bottleneck keeping the CPU from exceeding 50%. So running 2 Max/Jitter instances
> using the same cache, according to this theory, would not help. More tests needed to verify.
>
> So this again makes me want to run the equivalent process (exemplified in the
> jit.gl.render-grid.pat) on the GPU. Can someone point me to the nearest shader that can get us
> started?
>
> Thanks to all,
>
> Keith
>
>
> Joshua Kit Clayton wrote:
>
>
> So, offlist, I discovered that one of Keith's bottlenecks was using a
> modified version of the jit.gl.render-grid.pat example patch. There's
> a few things I'd like to share with the list regarding this patch.
> First off it's an old patch (Jitter 1.0), and there's some better
> ways to do it now. Secondly, there are some things in the patch which
> can be sped up. I've included my comments to Keith and an example
> patch using newer techniques (jit.gl.texture, jit.gl.mesh, jit.matrix
> exprfill), together with some optimizations. OpenGL rendering in
> Jitter is single threaded, so you're not going to get any benefit
> from Multicore CPUs there...
>
> 1. the first thing in the jit.gl.render-grid.pat patch is that if I
> run at a high resolution (like 256x256 vertices), I can get a
> noticeable speedup by eliminating all the unnecessary jit.pwindows in
> the patch which need to do float->char conversion, and/or downsampling.
>
> 2. I also got another 10-15% by disabling the interpolation for the
> initial jit.matrix, and if you don't need to convert the green color
> channel to alpha, you can save some CPU, by eliminating this stage
> entirely.
>
> 3. A cheap RGB-> luminance conversion is to just grab the green color
> channel (green is 70% of luminance data)
>
> 4. You can save a few more cycles by eliminating the jit.op @op * for
> z displacement, and instead rely on the ob3d scale method
>
> 5. using @unique 1 to jit.qt.movie you'll use less CPU for redundant
> frames (only an issue if your movie framerate is less than the patch
> framerate).
>
> 6. using jit.gl.mesh, you may get even better performance.
>
> -Joshua
>
>
> #P toggle 390 366 15 0;
> #P window setfont "Sans Serif" 9.;
> #P window linecount 1;
> #P message 390 389 88 196617 poly_mode $1 $1;
> #P toggle 765 100 15 0;
> #P message 765 128 44 196617 fsaa $1;
> #P toggle 706 100 15 0;
> #P message 706 128 46 196617 sync $1;
> #P hidden newex 309 335 75 196617 loadmess 0.25;
> #P window linecount 2;
> #P comment 387 108 245 196617 use jit.gl.texture for rectangular
> texture dimensions (not limited to power of two);
> #P window linecount 3;
> #P comment 597 55 347 196617 use unique to prevent redundant frames
> , and hence redundant rendering (should also turn down qmetro ,
> just leaving high to demonstrate performance);
> #P toggle 535 27 15 0;
> #P window linecount 1;
> #P message 535 55 53 196617 unique $1;
> #P hidden newex 445 239 48 196617 loadbang;
> #P window linecount 2;
> #P message 577 273 144 196617 exprfill 0 "norm[0]" , exprfill 1 "1.-
> norm[1]" , bang;
> #P window linecount 1;
> #P newex 577 305 190 196617 jit.matrix texcoords 2 float32 320 240;
> #P window linecount 2;
> #P message 398 273 112 196617 exprfill 0 "snorm[0]" , exprfill 1
> "snorm[1]" ,;
> #P window linecount 1;
> #P newex 398 305 168 196617 jit.matrix geom 3 float32 320 240;
> #P flonum 310 361 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
> #P flonum 267 361 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
> #P flonum 225 361 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
> #P newex 196 385 99 196617 pak scale 1. 1. 0.25;
> #P window linecount 2;
> #P comment 27 419 140 196617 use jit.gl.mesh and scale Z with the
> scale attribute;
> #P window linecount 1;
> #P newex 370 139 56 196617 t b l erase;
> #P newex 393 209 192 196617 jit.gl.texture render_grid @name mytex;
> #P newex 171 438 379 196617 jit.gl.mesh render_grid @draw_mode
> tri_grid @texture mytex @color 1. 1. 1. 1.;
> #P flonum 213 29 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
> #P user jit.fpsgui 57 113 60 196617 0;
> #P hidden message 415 51 103 196617 read multimeter.mov;
> #P newex 172 276 50 196617 t b;
> #P hidden message 399 29 14 196617 1;
> #N vpatcher 642 520 1118 872;
> #P inlet 230 84 15 0;
> #P toggle 352 214 15 0;
> #P window setfont "Sans Serif" 9.;
> #P message 352 233 75 196617 auto_rotate $1;
> #P message 315 233 32 196617 reset;
> #P newex 103 101 27 196617 t i i;
> #P flonum 261 213 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
> #P message 261 233 51 196617 radius $1;
> #P flonum 194 213 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
> #P message 194 233 60 196617 tracking $1;
> #P newex 10 254 355 196617 jit.gl.handle render_grid
> @inherit_transform 1 @depth_enable 1 @tracking 8;
> #P outlet 10 284 15 0;
> #P newex 103 56 50 196617 select 27;
> #P newex 103 34 40 196617 key;
> #P newex 120 146 91 196617 prepend fullscreen;
> #P newex 120 167 189 196617 jit.window render_grid @rect 10 50 200
> 200 @depthbuffer 1;
> #P comment 10 218 178 196617 inherit_transform is important here ,
> since we are controlling jit.gl.render;
> #P toggle 103 81 15 0;
> #P fasten 14 0 7 0 357 251 15 251;
> #P fasten 10 0 7 0 266 251 15 251;
> #P fasten 8 0 7 0 199 248 15 248;
> #P fasten 13 0 7 0 320 251 15 251;
> #P connect 7 0 6 0;
> #P connect 4 0 5 0;
> #P connect 5 0 0 0;
> #P connect 0 0 12 0;
> #P connect 12 1 3 0;
> #P connect 16 0 2 0;
> #P connect 3 0 2 0;
> #P connect 9 0 8 0;
> #P connect 11 0 10 0;
> #P connect 15 0 14 0;
> #P pop;
> #P newobj 706 155 115 196617 p window-mouse-rotate;
> #P message 361 49 42 196617 rate $1;
> #P flonum 361 29 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
> #P newex 172 305 211 196617 jit.pack 3 float32 320 240 @out_name geom;
> #P newex 172 210 203 196617 jit.matrix 1 float32 320 240 @planemap 2;
> #P message 235 49 28 196617 read;
> #P message 305 49 27 196617 stop;
> #P message 271 49 31 196617 start;
> #P toggle 172 28 15 0;
> #P newex 172 48 51 196617 qmetro 2;
> #P newex 172 79 153 196617 jit.qt.movie 320 240 @unique 1;
> #P newex 623 193 120 196617 jit.gl.render render_grid;
> #P comment 34 213 100 196617 grab green for luma;
> #P comment 508 242 349 196617 using new exprfill method to generate
> geometry and texture coordinates;
> #P window linecount 5;
> #P comment 828 111 100 196617 disable VBL sync for max framerate and
> benchmarking. turn on FSAA for full scene anti aliasing;
> #P connect 5 0 18 0;
> #P fasten 42 0 20 0 395 419 176 419;
> #P fasten 24 0 20 0 201 413 176 413;
> #P connect 11 0 20 0;
> #P connect 6 0 5 0;
> #P fasten 5 0 4 0 177 75 177 75;
> #P fasten 33 0 4 0 540 74 177 74;
> #P fasten 9 0 4 0 240 75 177 75;
> #P fasten 8 0 4 0 310 75 177 75;
> #P fasten 7 0 4 0 276 75 177 75;
> #P fasten 13 0 4 0 366 75 177 75;
> #P hidden fasten 17 0 4 0 420 72 177 72;
> #P connect 22 1 10 0;
> #P connect 10 0 16 0;
> #P connect 16 0 11 0;
> #P connect 19 0 5 1;
> #P fasten 30 0 20 1 582 426 222 426;
> #P connect 25 0 24 1;
> #P connect 26 0 24 2;
> #P connect 27 0 24 3;
> #P hidden connect 37 0 27 0;
> #P hidden connect 15 0 12 0;
> #P connect 12 0 13 0;
> #P fasten 4 0 22 0 177 102 375 102;
> #P connect 10 0 11 2;
> #P connect 43 0 42 0;
> #P connect 22 1 21 0;
> #P hidden connect 32 0 29 0;
> #P connect 29 0 28 0;
> #P connect 34 0 33 0;
> #P hidden connect 32 0 31 0;
> #P connect 31 0 30 0;
> #P fasten 22 0 3 0 375 174 628 174;
> #P fasten 22 2 3 0 421 168 628 168;
> #P fasten 14 0 3 0 711 177 628 177;
> #P connect 39 0 38 0;
> #P connect 40 0 14 0;
> #P connect 38 0 14 0;
> #P connect 41 0 40 0;
> #P window clipboard copycount 44;
>
>
>
>
>
> Keith McMillen
> BEAM Foundation
> http://www.beamfoundation.org/
>

Keith McMillen
BEAM Foundation
http://www.beamfoundation.org/
510.502.5310

Joshua Kit Clayton's icon

On Aug 26, 2006, at 7:13 AM, Keith McMillen wrote:

> These changes did speed up the patches performance and when
> combined with my other Jit patches (needed for the rest of the
> localization, reactive processing, etc...) I have doubled total
> frame rate from 4 fps to 8-9 fps. CPU still sits at 50%.

Please confirm: Is it 1 core at 50% or 2 cores at 50%?

If the latter, you are using 100% of one CPU with some operation that
is essentially single threaded (and thus can't exploit multiple
cores). It's just a single thread alternating between processors as
has been covered by a few messages in this thread already. There's
nothing you can do aside from changing your patch in general or
running mulitple copies of Max to improve performance in this case.

If the former, something is up. As already mentioned: perhaps you
need to increase your queuethrottle (have lots of UI elements), or
you need to disable the @sync attribute for all your window objects.
I don't believe that this is attributable to cache stalls since those
stalls actually would be calculated into the CPU performance meter.

In general, there's probably tons of improvement you can make to your
patch to speed it up before worrying about using multiple cores.
First, I'd start by removing unnecessary UI objects. Especially
jit.pwindow (get rid of anything which is not *absolutely* necessary
to the functioning of your patch). Second, make sure you turn off all
processing which is not required at any point in time (this includes
actually stopping jit.qt.movie files which are not in use). Third, if
you aren't already, only use *one* jit.window object for your entire
patch.

> So this again makes me want to run the equivalent process
> (exemplified in the jit.gl.render-grid.pat) on the GPU. Can someone
> point me to the nearest shader that can get us started?

That's like what Ali's attempting. It's actually easier to do this on
the CPU. And to be honest, this doesn't sound like your bottleneck at
all. I'd focus on *pixel* processing on the GPU. Usually there's way
more pixels to be dealt with than vertices.

-Joshua

keith@beamfoundation.'s icon

"Thou shall not worship false idle cycles"

Oh but I do. So:

Moved the texture diplacement patch to an instance of Max-Runtime and sent viodeo in and out via
netsend/receive. Result is this patch now runs at full frame rate (15fps) and the three other
patches running on the normal Max instance run at full frame rate. Fluid are his ways.

What does the graven image of the CPU meter read: both cores at 92%. So in fact Josh (and others)
you are right that one thread running at 100% on one core reads like 50% on 2 cores.

So now wire up the rest of the control structure for netsend/receive and see if this is stable.

Thanks,

Keith
--- Joshua Kit Clayton wrote:

>
> On Aug 26, 2006, at 7:13 AM, Keith McMillen wrote:
>
> > These changes did speed up the patches performance and when
> > combined with my other Jit patches (needed for the rest of the
> > localization, reactive processing, etc...) I have doubled total
> > frame rate from 4 fps to 8-9 fps. CPU still sits at 50%.
>
> Please confirm: Is it 1 core at 50% or 2 cores at 50%?
>
> If the latter, you are using 100% of one CPU with some operation that
> is essentially single threaded (and thus can't exploit multiple
> cores). It's just a single thread alternating between processors as
> has been covered by a few messages in this thread already. There's
> nothing you can do aside from changing your patch in general or
> running mulitple copies of Max to improve performance in this case.
>
> If the former, something is up. As already mentioned: perhaps you
> need to increase your queuethrottle (have lots of UI elements), or
> you need to disable the @sync attribute for all your window objects.
> I don't believe that this is attributable to cache stalls since those
> stalls actually would be calculated into the CPU performance meter.
>
> In general, there's probably tons of improvement you can make to your
> patch to speed it up before worrying about using multiple cores.
> First, I'd start by removing unnecessary UI objects. Especially
> jit.pwindow (get rid of anything which is not *absolutely* necessary
> to the functioning of your patch). Second, make sure you turn off all
> processing which is not required at any point in time (this includes
> actually stopping jit.qt.movie files which are not in use). Third, if
> you aren't already, only use *one* jit.window object for your entire
> patch.
>
>
> > So this again makes me want to run the equivalent process
> > (exemplified in the jit.gl.render-grid.pat) on the GPU. Can someone
> > point me to the nearest shader that can get us started?
>
> That's like what Ali's attempting. It's actually easier to do this on
> the CPU. And to be honest, this doesn't sound like your bottleneck at
> all. I'd focus on *pixel* processing on the GPU. Usually there's way
> more pixels to be dealt with than vertices.
>
>
> -Joshua
>

Keith McMillen
BEAM Foundation
http://www.beamfoundation.org/
510.502.5310

Joshua Kit Clayton's icon

On Aug 29, 2006, at 7:28 AM, Keith McMillen wrote:
> Moved the texture diplacement patch to an instance of Max-Runtime
> and sent viodeo in and out via
> netsend/receive. Result is this patch now runs at full frame rate
> (15fps) and the three other
> patches running on the normal Max instance run at full frame rate.
> Fluid are his ways.

Glad this is working for you.

> What does the graven image of the CPU meter read: both cores at
> 92%. So in fact Josh (and others)
> you are right that one thread running at 100% on one core reads
> like 50% on 2 cores.

The point is that on a dual core machine is that no thread runs only
on one core, and that no one thread can run on two cores
simultaneously. The threads are switched around at the OS's
discretion to balance loads across cores. Hence the CPU meter's
accurate reflection of what is taking place: one CPU expensive thread
running alternatingly (and never at the same time) on two cores.

-Joshua

williamshome's icon

Hi Joshua,

It is interesting to hear that there are ways to overcome the "50% limit"
problem.

In order to distinguise "multi-threaded" object and "single-threaded"
object, are there any list we can look up ?

Also, is it possible to have multiple copies of runtime Max/Jitter running ?
Sometimes it is not practical to have Max non-runtime to do the job, e.g. on
exhitbition.

yours,
William

On 8/30/06, Joshua Kit Clayton wrote:
>
>
> On Aug 29, 2006, at 7:28 AM, Keith McMillen wrote:
> > Moved the texture diplacement patch to an instance of Max-Runtime
> > and sent viodeo in and out via
> > netsend/receive. Result is this patch now runs at full frame rate
> > (15fps) and the three other
> > patches running on the normal Max instance run at full frame rate.
> > Fluid are his ways.
>
> Glad this is working for you.
>
> > What does the graven image of the CPU meter read: both cores at
> > 92%. So in fact Josh (and others)
> > you are right that one thread running at 100% on one core reads
> > like 50% on 2 cores.
>
> The point is that on a dual core machine is that no thread runs only
> on one core, and that no one thread can run on two cores
> simultaneously. The threads are switched around at the OS's
> discretion to balance loads across cores. Hence the CPU meter's
> accurate reflection of what is taking place: one CPU expensive thread
> running alternatingly (and never at the same time) on two cores.
>
> -Joshua
>
>
>