Forums > Dev

More efficient than: if (++index >= arraylength) index = 0; ?

September 7, 2006 | 1:40 am

Hi,

Peter, you wrote that James McCartney wrote an article on something
more efficient than:

if (++index >= arraylength) index = 0;

for setting the index back to zero rather than going off the end of
the array. I’m curious to see what his method is. I had a quick look
in the music dsp archives but had no idea what to search for. Can you
post a link to it?

Thanks,

Aengus.


September 7, 2006 | 7:56 am

On 7-Sep-2006, at 3:40, Aengus Martin wrote:

> Peter, you wrote that James McCartney wrote an article on something
> more efficient than:
>
> if (++index >= arraylength) index = 0;

The problem is doing that inside your inner loop.

I had a quick google and didn’t find the article either, I’m afraid.

Remind me next week and I can post some example code showing the
technique. Sorry, but not thi week.

————– http://www.bek.no/~pcastine/Litter/ ————-
Peter Castine +–> Litter Power & Litter Bundle for Jitter
Universal Binaries on the way
iCE: Sequencing, Recording &
Interface Building for |home | chez nous|
Max/MSP Extremely cool |bei uns | i nostri|
http://www.dspaudio.com/ http://www.castine.de


September 7, 2006 | 4:49 pm

Well, the easy way to do this if your array length is an even power of
two would be:

say len = 256 (1 < < 7)...

++index;
index &= 0x00ff;

or if len = 4096 (1 < < 13, the size of a memory page in os x)
++index;
index &= 0x0fff;

the bitwise-and then store operation is comparatively cheap,
one to two instructions depending on optimization.

one can easily make a mask given an even power of two length like:

#define make_mask(len) ( ~( (~(len) + 1) ) )

that will work for most of the powers of two (until you start hitting
overflow)

But… before you go optimizing, have you done any profiling?
Do you know that your bounds check is really slowing you down?

_Mark

On Sep 7, 2006, at 12:56 AM, Peter Castine wrote:

> On 7-Sep-2006, at 3:40, Aengus Martin wrote:
>
>> Peter, you wrote that James McCartney wrote an article on something
>> more efficient than:
>>
>> if (++index >= arraylength) index = 0;
>
> The problem is doing that inside your inner loop.
>
> I had a quick google and didn’t find the article either, I’m afraid.
>
> Remind me next week and I can post some example code showing the
> technique. Sorry, but not thi week.
>
> ————– http://www.bek.no/~pcastine/Litter/ ————-
> Peter Castine +–> Litter Power & Litter Bundle for Jitter
> Universal Binaries on the way
> iCE: Sequencing, Recording &
> Interface Building for |home | chez nous|
> Max/MSP Extremely cool |bei uns | i nostri|
> http://www.dspaudio.com/ http://www.castine.de
>
>


September 10, 2006 | 7:08 pm

Hi Aengus,
i don’t think there are any other resonably general ways for this than
either to avoid branching or to direct the branch prediction of the cpu
into the right direction.
For the sake of maintainability i normally prefer the latter… modern
gcc compilers have a pseudo-function for "branch hints", called
__builtin_expect. You can use it as shown in
http://kratcode.wordpress.com/2006/02/09/likelyunlikely-gcc-extensions/
The compiler knows how cpus implement branch prediction and will emit
proper code for this.
if arraylength is >> 0, then your code will be
if (unlikely(++index >= arraylength)) index = 0;

best greetings,
Thomas


September 12, 2006 | 11:01 am

On 7-Sep-2006, at 9:56, Peter Castine wrote:
> On 7-Sep-2006, at 3:40, Aengus Martin wrote:
>> Peter, you wrote that James McCartney wrote an article on something
>> more efficient than:
>>
>> if (++index >= arraylength) index = 0;
>
> The problem is doing that inside your inner loop.
>
> Remind me next week and I can post some example code showing the
> technique.

Mark’s trick with bit-masking rather than branching is not what I was
talking about, although it is another approach for power-of-two arrays.

Thomas’ idea of branch hints is OK, but only works for some chips
(granted, the chips you’re writing for with Max/MSP/Jitter externals
have this).

What I’m on about is a general technique that you should have in your
programming vocabulary.

BEFORE you attack this, make sure that your code is correct and don’t
worry about efficiency. I cannot emphasize this enough.

And *of course* you should profile your code to make sure that you
actually do have a performance problem before attacking efficiency
concerns. OTOH, DSP-loops are always a bottleneck because someone is
bound to instantiate 100 copies of your MSP external sooner or later.

So, if you have a perform loop that looks like:

===========================
int n = vectorSize,
m = 0;
float *inputVector,
buf[kArraySize];

// Initialize inputVector and, if necessary, buf.
// then…

while (n– > 0) {
m += 1;
if (m >= kArraySize)
m = 0;
buf[m] = *inputVector++;
// whatever other processing follows…
}

===========================

The code is less efficient than it could be because you have two
branch instructions in your inner loop (the while condition is a
branch, as is the if statement). Ideally you will never have more
than a single branch in your inside loop, the loop condition.

The solution is to unwrap something like the below:

===========================
// Initialize as before, then:

while (n > 0) { // Decrement inside loop
int sampsThisTime = (n > kArraySize) ? kArraySize : n;
n -= sampsThisTime;

m = 0;
while (sampsThisTime– > 0) {
buf[m++] = *inputVector++;
// whatever other processing follows…
}
}

===========================

The inner loop now only has a single branch instruction for the loop
condition. You have branch overhead for the outer loop, but those
branch instructions will, on average, only be executed once every
kArraySize samples.

Obviously not worth the effort if kArraySize is small but the
assumption is that the buffer is large enough to justify the extra work.

There are zillions of ways to further tweak the code (use an
incremented pointer instead of buf[m], etc.) Left as an exercise for
the reader.

I hear lunch is on the table. The code above is untested and hasn’t
even been proof-read. Caveat lector.

————– http://www.bek.no/~pcastine/Litter/ ————-
Peter Castine +–> Litter Power & Litter Bundle for Jitter
Universal Binaries on the way
iCE: Sequencing, Recording &
Interface Building for |home | chez nous|
Max/MSP Extremely cool |bei uns | i nostri|
http://www.dspaudio.com/ http://www.castine.de


September 12, 2006 | 7:38 pm

Also, it just occurred to me that you should make very sure that your
circular buffer is at least as large as the DSP vector. I know this
is a silly and simple suggestion, but you might notice nasty clicks
and other artifacts when you overwrite the starting point in your
buffer for the current dsp call.

One more thing ;)

If you’re using Peter’s excellent suggestion to pre-compute the wrap
value, and the element size of your circular buffer is the same as the
element size of the dsp vector, you may as well use memcpy. You may
see up to a 2x improvement with that method, as os x auto-vectorizes
memcpy when possible.

so the code would look like:

while (n > 0) { // Decrement inside loop
int sampsThisTime;
size_t newCurBufPos;
if( (n + curBufPos) => kArraySize) {
// we’re wrapping
sampsThisTime = (kArraySize – curBufPos);
newCurBufPos = 0;
}
else {
// it all fits
sampsThisTime = n;
newCurBufPos += n;
}

memcpy(buf + curBufPos,
inputVector + curInputPos,
sampsThisTime * sizeof(*inputVector));

n -= sampsThisTime;
curInputPos += sampsThisTime;
curBufPos = newCurBufPos;
}

This way we don’t even have an internal loop (though memcpy does);
I think that the code will only loop twice at most, if it loops more
than that then you’re losing data.
I’m pretty sure this will work, but I ran no tests :)

_Mark


Viewing 6 posts - 1 through 6 (of 6 total)