Multicore cracks and pops

HSel

Hi all,
I've started experimenting a bit with the multicore support added in Max5 lately.
My patch runs relatively high (about 55% on one core of a 3gHz core 2 duo processor). I therefore wanted to make use of [threadcount] and [parallel] messages to poly~.
My patch is split up into poly~ instances and works well when running on one core. It also runs well on a dual-core machine with [parallel] turned on (takes up about 40% on each core on a 2.4gHz processor).

Today I tried my patch on a quad-core cpu, splitting the load between 4 cores. This resulted in a variety of artifacts to the sound; cracks, pops ++.
Changing the vector sizes does not appear to make any difference.
Sending the message [threadcount 2] to the poly~ object removes the artifacts however, and everything runs as normal. This naturally brings the load down to 2 cores however, and the other 2 are just idling.

Im just wondering if there's a bug when trying to split the load on more than 2 cores? Or am I missing some parameters perhaps?
Perhaps it's easier for Max to loose the sync between the cores the more there are of them?

Any tips are appreciated!

Emmanuel Jourdan

Make sure that you have 5.0.5.

Best,
ej

On 16 oct. 08, at 14:49, Hans wrote:

>
> Hi all,
> I've started experimenting a bit with the multicore support added in
> Max5 lately.
> My patch runs relatively high (about 55% on one core of a 3gHz core
> 2 duo processor). I therefore wanted to make use of [threadcount]
> and [parallel] messages to poly~.
> My patch is split up into poly~ instances and works well when
> running on one core. It also runs well on a dual-core machine with
> [parallel] turned on (takes up about 40% on each core on a 2.4gHz
> processor).
>
> Today I tried my patch on a quad-core cpu, splitting the load
> between 4 cores. This resulted in a variety of artifacts to the
> sound; cracks, pops ++.
> Changing the vector sizes does not appear to make any difference.
> Sending the message [threadcount 2] to the poly~ object removes the
> artifacts however, and everything runs as normal. This naturally
> brings the load down to 2 cores however, and the other 2 are just
> idling.
>
>
> Im just wondering if there's a bug when trying to split the load on
> more than 2 cores? Or am I missing some parameters perhaps?
> Perhaps it's easier for Max to loose the sync between the cores the
> more there are of them?
>
>
> Any tips are appreciated!
>
>

HSel

Ah.
Maybe there's something to gain there.
I was using 5.0.4.
Actually, I was using a standalone built with 5.0.4.

I'll try to build with 5.0.5 tomorrow and let you know if there's any difference.

HSel

I have now been testing out the same patch using 5.0.5 and I keep getting some confusing results....

Here are some testdata from both 5.0.4 and 5.0.5 running the same patch on the same quad-core pc, at two different vector sizes.
64 samples would be my target for the patch to run at.

5.0.4
256 vector size
using 1 thread; 61% of one core
using 2 threads; ~40% on the main core, 25% on the other
using 4 threads; ~26% on the main core, about 17% on the others.
(no dropouts in audio on any of the settings. But getting
clicks when running 2 or 4 threads. More so on 4 than on
2.)

64 vector size
using 1 thread; ~75% of one core
(constant dropouts in audio)
using 2 threads; ~50% of the main core, about ~40% on the other
(some dropouts in audio, and some clicks)
using 4 threads; ~34% on the main core, about 17% on the
others
(no dropouts in audio, but more clicks)

5.0.5
256 vector size
using 1 thread; ~70% of one core
(very few audio dropouts)
using 2 threads; ~55% on the main core, ~20% on the other
(no dropouts in audio. Thread 2 seems to jumping from core
to core, which it did not in 5.0.4)
using 4 threads; ~41% on the main core; ~10-15% on the others
(no dropouts in audio)

64 vector size
using 1 thread; ~94% of one core
(heavy dropouts in audio. Completely distorted signal)
using 2 threads; ~70% on the main core, about 35% on the other
(no dropouts in audio)
using 4 threads; ~55% on the main core, about 20% on the others
(no dropouts in audio)

It appears that the main thread in 5.0.5 carries more of the load in 5.0.5 than it did in 5.0.4.
However, this does not account for the ~20% increase of cpu usage when running 1 thread at 64 vector size!

It also appears that 5.0.5 needs a higher vector size to run reliably than 5.0.4 did when using the parallel option.

I ran this test several times, with reboots of the computer inbetween, and while the cpu-usage stayed the same, the audible result differed. I could not find 1 setting that worked 100% of the times with no audible artifacts.
Going up beyond 128 samples vector size isn't possible for this project, and I suspect I would need to go to 512 for this to run reliably with the current implementation.

I also noticed that [parallel $1] does not seem to work in 5.0.5 if [threadcount] is larger than 1. A bug?

Anyway, sorry about the long post.

I think I might have to stay with Max4 for the time being with this application, as I don't seem to get any advantage of using multiple cores, and the added cpu strain in Max5 isn't worth it if only using one core... for this patch at least.

Roth

Were you able to find any resolution to this issue? I am having similar problems with Max 5.0.5 (and earlier versions too I think, about to test with 5.0.4 and 5.0.2 when I have a chance) with OS X 10.5.3 on a dual core and octo core machine. I'll try to most some more detailed information later.

HSel

Well....no.
Seems like the 5.0.5 update took care of cracks and pops when using the [parallel] option. But with the cost of a much higher cpu load.

I have put the whole idea of parallelism on hold for now, waiting for updates on this. Am currently using Max 4.6 running the same patch on one core. Works reliably and at a much lower cpu-load than the same patch on the same computer under Max 5.0.5.
Waiting to see what the future brings :)

Let me know if you find a fix to it though!

Andrew Pask

This all sounds about right to us. We're guessing that the testing patch being used does very little processing and has many out~ objects. This case just isn't going to improve any time soon, and you'll need to find another strategy to solve the problem.

The major change in 5.0.5 is that out~ objects no longer accumulate into the same memory. For thread safety reasons (which on many newer processors, led to clicks), they need to be accumulated into the final output memory from the main DSP thread. This means that where there isn't much computation inside a single voice, but *many* out~ objects and *many* voices, the accumulation in the main DSP thread will outweigh the CPU which can be offloaded to the other threads. Other situations which there is heavy computation within a single voice and a handful of out~ objects should notice only minimal impact with this necessary change.

The other thing which this implementation requires is that the DSP must be restarted for the parallel message to take effect, so that all the out~ objects can be initialized properly.

Cheers

-A

Roth

Quote: Andrew Pask wrote on Mon, 10 November 2008 12:36
----------------------------------------------------
> The major change in 5.0.5 is that out~ objects no longer accumulate into the same memory. For thread safety reasons (which on many newer processors, led to clicks), they need to be accumulated into the final output memory from the main DSP thread.

So does that mean that if I have one of those newer processors, running an older version is not a legit solution because it would be "unsafe"?

> The other thing which this implementation requires is that the DSP must be restarted for the parallel message to take effect, so that all the out~ objects can be initialized properly.

I hadn't been using the "parallel" message, but the @parallel attribute. Does that mean it is worth trying sending the message and cycling DSP? I had given up for now (at least until after my gig on Thursday) and was just trying to be more efficient in my synthesis/use less instances. Is it worth trying the message vs. the attribute when I get home tonight?

Andrew Pask

>So does that mean that if I have one of those newer processors, running an older version is not a legit solution because it would be "unsafe"?

Pretty much.

>I hadn't been using the "parallel" message, but the @parallel attribute.

Same thing.

-A

Roth

Thanks for the quick response! I guess I'll go easy on my poly-FFT for now.

HSel

I figured it might be something like that causing the overhead in the main thread.
However, I can't see it explaining the cpu usage increase when using only 1 thread from 5.0.4 to 5.0.5?

Perhaps I didn't restart my dac when changing the settings, and that it was using some older settings I had tried... I didn't really think about that seeing as the cpu load changed instantly when changing the settings when the dac was on.

You are right in thinking the test patch has lots of ins and outs, and fairly low computation in each voice. I will see if there's any way of restructuring this to make use of [parallel] but there's no really obvious solution.

Thanx for clarifying this though, I appreciate it!