Forums > Dev

Vecorized convolution external on intel mac (ala buffir~)

December 13, 2006 | 12:16 pm

Hi,

I coded a convolution external that is quasi identical to buffir~, but with added vector calculation from apple convolution routine/accelerate framework.

So on PPC mac, it was boosted by using altivec and provided 4x better performances than buffir~.

Now I want to make it work on intel mac too.
I thought that the external would automatically use the accelerate framework present on the system (and thus, on an intel mac use apple intel oriented library) but so far, it show no optimization on my imac intel (it’s even consuming more CPU cycles than the original buffir~…)

Would that means that accelerate framework on intel mac is not really vector optimisated ? Or more probably that something is wrong in my code ? :-)

I would greatly appreaciate any help, advice, or whatever to make me obtain the same performance (or better :-)) that I got on the ppc !
Attached in the codewarrior project if needed.

Thanks !

Salvator


January 23, 2007 | 5:03 pm

Anyone ?

Hiring propositions are welcome !
Thanks,

Salvator


January 24, 2007 | 1:07 am

Hi Salvator,

> I coded a convolution external that is quasi identical
> to buffir~, but with added vector calculation from apple
> convolution routine/accelerate framework.
>
> So on PPC mac, it was boosted by using altivec and provided
> 4x better performances than buffir~.

When building your code on vecLib through the Accelerate framework, it will be Altivec optimized on G4 and G5 (not G3), and SSE optimized on x86. Automatically, meaning no Altivec statements in your code.
However, a ‘pure Altivec’ code translated to a ‘pure SSE’ (SSE2) should have to be be compared to be sure about speed and stability.

> Now I want to make it work on intel mac too.
> I thought that the external would automatically use the
> accelerate framework present on the system (and thus, on
> an intel mac use apple intel oriented library) but so far,
> it show no optimization on my imac intel (it’s even consuming
> more CPU cycles than the original buffir~…)

You should get a boost on both architecture while burning more on G4/G5, and less on x86.
(On G3, vecLib will run in scalar mode!)

> Would that means that accelerate framework on intel mac
> is not really vector optimisated ?

It is really vectorized, for audio and for image processing.
(I used it for real-time optical processing, and worked fast!)

> Or more probably that something is wrong in my code ? :-)

Despite the great help of Olaf Matthes, I gave up about translating a vDSP code into a MaxMSP external :-(
Or it doesn’t work, or it’s even slower than built for CoreAudio/AudioUnit.
(I now focus my work on CoreAudio only (and Cocoa) and see later if something works for MSP.)
I suggest to ask for an help at the Apple’s CoreAudio list.

> Attached in the codewarrior project if needed.

Not anymore! Xcode is its name ;-)

Bye,
Philippe


January 24, 2007 | 1:32 am

Many thanks Philippe for the advices !

>When building your code on vecLib through the Accelerate framework, it will be Altivec optimized on G4 and G5 (not G3), and SSE optimized on x86. Automatically, meaning no Altivec statements >in your code.
However, a ‘pure Altivec’ code translated to a ‘pure SSE’ (SSE2) should have to be be compared to be sure about speed and stability.
>

Yes, that’s what I thought. That it would automatically translate, but actually, there is no gain, it’s even 15% worse than buffir~
so I guess something is wrong in my code…

> I suggest to ask for an help at the Apple’s CoreAudio list.

Thanks I’ll give a shot there

> Not anymore! Xcode is its name ;-)

Did I said codewarrior ? Oh my bad typo … it’s indeed an Xcode project ! :-)
It compile fine here on both PPC and intel.
If ever you have time for a quick advice on the code …

Salvator


January 24, 2007 | 3:35 am

Salvator wrote on Wed, 24 January 2007 02:32
—————————————————-
> If ever you have time for a quick advice on the code …

The problem is #include "z_dsp.h"
And "z_dsp.h" #include "z_altivec.h"

And all this Altivec stuff is inside MaxAPI.framework
And we don’t need anymore Altivec for i386 CPUs.

Therefore, your code runs in Altivec mode on ppc, but *not* in SSE on ppc i386 despite the #include .
=> Altivec turns your code into scalar on i386 !!

Sorry Salvator, there’s nothing I can do :-(
(I was not able to do anything for my own C++ !?!)

Someone else could bring us an help, please?

Kind regards,
Philippe


Viewing 5 posts - 1 through 5 (of 5 total)