Salvator

I coded a convolution external that is quasi identical to buffir~, but with added vector calculation from apple convolution routine/accelerate framework.

So on PPC mac, it was boosted by using altivec and provided 4x better performances than buffir~.

Now I want to make it work on intel mac too.

I thought that the external would automatically use the accelerate framework present on the system (and thus, on an intel mac use apple intel oriented library) but so far, it show no optimization on my imac intel (it's even consuming more CPU cycles than the original buffir~...)

Would that means that accelerate framework on intel mac is not really vector optimisated ? Or more probably that something is wrong in my code ? :-)

I would greatly appreaciate any help, advice, or whatever to make me obtain the same performance (or better :-)) that I got on the ppc !

Attached in the codewarrior project if needed.

I coded a convolution external that is quasi identical to buffir~, but with added vector calculation from apple convolution routine/accelerate framework. 

Now I want to make it work on intel mac too.
I thought that the external would automatically use the accelerate framework present on the system (and thus, on an intel mac use apple intel oriented library) but so far, it show no optimization on my imac intel (it's even consuming more CPU cycles than the original buffir~...)

I would greatly appreaciate any help, advice, or whatever to make me obtain the same performance (or better :-)) that I got on the ppc !
Attached in the codewarrior project if needed.

vecorized-convolution-external-on-intel-mac-ala-buffir

Hiring propositions are welcome !
Thanks,

> I coded a convolution external that is quasi identical

> to buffir~, but with added vector calculation from apple

> convolution routine/accelerate framework. 

> So on PPC mac, it was boosted by using altivec and provided

When building your code on vecLib through the Accelerate framework, it will be Altivec optimized on G4 and G5 (not G3), and SSE optimized on x86. Automatically, meaning no Altivec statements in your code.

However, a 'pure Altivec' code translated to a 'pure SSE' (SSE2) should have to be be compared to be sure about speed and stability.

> Now I want to make it work on intel mac too.

> I thought that the external would automatically use the

> accelerate framework present on the system (and thus, on

> an intel mac use apple intel oriented library) but so far,

> it show no optimization on my imac intel (it's even consuming

> more CPU cycles than the original buffir~...)

You should get a boost on both architecture while burning more on G4/G5, and less on x86.

> Would that means that accelerate framework on intel mac

It is really vectorized, for audio and for image processing.

(I used it for real-time optical processing, and worked fast!)

> Or more probably that something is wrong in my code ? :-)

Despite the great help of Olaf Matthes, I gave up about translating a vDSP code into a MaxMSP external :-(

Or it doesn't work, or it's even slower than built for CoreAudio/AudioUnit.

(I now focus my work on CoreAudio only (and Cocoa) and see later if something works for MSP.)

I suggest to ask for an help at the Apple's CoreAudio list.

> Attached in the codewarrior project if needed.

> I coded a convolution external that is quasi identical
> to buffir~, but with added vector calculation from apple
> convolution routine/accelerate framework. 
>
> So on PPC mac, it was boosted by using altivec and provided
> 4x better performances than buffir~.

When building your code on vecLib through the Accelerate framework, it will be Altivec optimized on G4 and G5 (not G3), and SSE optimized on x86. Automatically, meaning no Altivec statements in your code.
However, a 'pure Altivec' code translated to a 'pure SSE' (SSE2) should have to be be compared to be sure about speed and stability.

> Now I want to make it work on intel mac too.
> I thought that the external would automatically use the
> accelerate framework present on the system (and thus, on
> an intel mac use apple intel oriented library) but so far,
> it show no optimization on my imac intel (it's even consuming
> more CPU cycles than the original buffir~...)

You should get a boost on both architecture while burning more on G4/G5, and less on x86.
(On G3, vecLib will run in scalar mode!)

> Would that means that accelerate framework on intel mac
> is not really vector optimisated ?

It is really vectorized, for audio and for image processing.
(I used it for real-time optical processing, and worked fast!)

Despite the great help of Olaf Matthes, I gave up about translating a vDSP code into a MaxMSP external :-(
Or it doesn't work, or it's even slower than built for CoreAudio/AudioUnit.
(I now focus my work on CoreAudio only (and Cocoa) and see later if something works for MSP.)
I suggest to ask for an help at the Apple's CoreAudio list.

>When building your code on vecLib through the Accelerate framework, it will be Altivec optimized on G4 and G5 (not G3), and SSE optimized on x86. Automatically, meaning no Altivec statements >in your code.

Yes, that's what I thought. That it would automatically translate, but actually, there is no gain, it's even 15% worse than buffir~

so I guess something is wrong in my code...

> I suggest to ask for an help at the Apple's CoreAudio list.

Did I said codewarrior ? Oh my bad typo ... it's indeed an Xcode project ! :-)

It compile fine here on both PPC and intel.

If ever you have time for a quick advice on the code ...

>When building your code on vecLib through the Accelerate framework, it will be Altivec optimized on G4 and G5 (not G3), and SSE optimized on x86. Automatically, meaning no Altivec statements >in your code.
However, a 'pure Altivec' code translated to a 'pure SSE' (SSE2) should have to be be compared to be sure about speed and stability.
>

Yes, that's what I thought. That it would automatically translate, but actually, there is no gain, it's even 15% worse than buffir~
so I guess something is wrong in my code...

Did I said codewarrior ? Oh my bad typo ... it's indeed an Xcode project ! :-)
It compile fine here on both PPC and intel.
If ever you have time for a quick advice on the code ...

Salvator wrote on Wed, 24 January 2007 02:32

----------------------------------------------------

> If ever you have time for a quick advice on the code ...

And all this Altivec stuff is inside MaxAPI.framework

And we don't need anymore Altivec for i386 CPUs.

Therefore, your code runs in Altivec mode on ppc, but *not* in SSE on ppc i386 despite the #include .

=> Altivec turns your code into scalar on i386 !!

Sorry Salvator, there's nothing I can do :-(

(I was not able to do anything for my own C++ !?!)

Someone else could bring us an help, please?

Salvator wrote on Wed, 24 January 2007 02:32
----------------------------------------------------
> If ever you have time for a quick advice on the code ...

The problem is #include "z_dsp.h"
And "z_dsp.h" #include "z_altivec.h"

And all this Altivec stuff is inside MaxAPI.framework
And we don't need anymore Altivec for i386 CPUs.

Therefore, your code runs in Altivec mode on ppc, but *not* in SSE on ppc i386 despite the #include .
=> Altivec turns your code into scalar on i386 !!

Sorry Salvator, there's nothing I can do :-(
(I was not able to do anything for my own C++ !?!)

Vecorized convolution external on intel mac (ala buffir~)