More efficient than: if (++index >= arraylength) index = 0; ?


    Sep 07 2006 | 1:40 am
    Hi,
    Peter, you wrote that James McCartney wrote an article on something
    more efficient than:
    if (++index >= arraylength) index = 0;
    for setting the index back to zero rather than going off the end of
    the array. I'm curious to see what his method is. I had a quick look
    in the music dsp archives but had no idea what to search for. Can you
    post a link to it?
    Thanks,
    Aengus.

    • Sep 07 2006 | 7:56 am
      On 7-Sep-2006, at 3:40, Aengus Martin wrote:
      > Peter, you wrote that James McCartney wrote an article on something
      > more efficient than:
      >
      > if (++index >= arraylength) index = 0;
      The problem is doing that inside your inner loop.
      I had a quick google and didn't find the article either, I'm afraid.
      Remind me next week and I can post some example code showing the
      technique. Sorry, but not thi week.
      -------------- http://www.bek.no/~pcastine/Litter/ -------------
      Peter Castine +--> Litter Power & Litter Bundle for Jitter
      Universal Binaries on the way
      iCE: Sequencing, Recording &
      Interface Building for |home | chez nous|
      Max/MSP Extremely cool |bei uns | i nostri|
    • Sep 07 2006 | 4:49 pm
      Well, the easy way to do this if your array length is an even power of
      two would be:
      say len = 256 (1 << 7)...
      ++index;
      index &= 0x00ff;
      or if len = 4096 (1 << 13, the size of a memory page in os x)
      ++index;
      index &= 0x0fff;
      the bitwise-and then store operation is comparatively cheap,
      one to two instructions depending on optimization.
      one can easily make a mask given an even power of two length like:
      #define make_mask(len) ( ~( (~(len) + 1) ) )
      that will work for most of the powers of two (until you start hitting
      overflow)
      But... before you go optimizing, have you done any profiling?
      Do you know that your bounds check is really slowing you down?
      _Mark
      On Sep 7, 2006, at 12:56 AM, Peter Castine wrote:
      > On 7-Sep-2006, at 3:40, Aengus Martin wrote:
      >
      >> Peter, you wrote that James McCartney wrote an article on something
      >> more efficient than:
      >>
      >> if (++index >= arraylength) index = 0;
      >
      > The problem is doing that inside your inner loop.
      >
      > I had a quick google and didn't find the article either, I'm afraid.
      >
      > Remind me next week and I can post some example code showing the
      > technique. Sorry, but not thi week.
      >
      > -------------- http://www.bek.no/~pcastine/Litter/ -------------
      > Peter Castine +--> Litter Power & Litter Bundle for Jitter
      > Universal Binaries on the way
      > iCE: Sequencing, Recording &
      > Interface Building for |home | chez nous|
      > Max/MSP Extremely cool |bei uns | i nostri|
      > http://www.dspaudio.com/ http://www.castine.de
      >
      >
    • Sep 10 2006 | 7:08 pm
      Hi Aengus,
      i don't think there are any other resonably general ways for this than
      either to avoid branching or to direct the branch prediction of the cpu
      into the right direction.
      For the sake of maintainability i normally prefer the latter... modern
      gcc compilers have a pseudo-function for "branch hints", called
      __builtin_expect. You can use it as shown in
      http://kratcode.wordpress.com/2006/02/09/likelyunlikely-gcc-extensions/
      The compiler knows how cpus implement branch prediction and will emit
      proper code for this.
      if arraylength is >> 0, then your code will be
      if (unlikely(++index >= arraylength)) index = 0;
      best greetings,
      Thomas
    • Sep 12 2006 | 11:01 am
      On 7-Sep-2006, at 9:56, Peter Castine wrote:
      > On 7-Sep-2006, at 3:40, Aengus Martin wrote:
      >> Peter, you wrote that James McCartney wrote an article on something
      >> more efficient than:
      >>
      >> if (++index >= arraylength) index = 0;
      >
      > The problem is doing that inside your inner loop.
      >
      > Remind me next week and I can post some example code showing the
      > technique.
      Mark's trick with bit-masking rather than branching is not what I was
      talking about, although it is another approach for power-of-two arrays.
      Thomas' idea of branch hints is OK, but only works for some chips
      (granted, the chips you're writing for with Max/MSP/Jitter externals
      have this).
      What I'm on about is a general technique that you should have in your
      programming vocabulary.
      BEFORE you attack this, make sure that your code is correct and don't
      worry about efficiency. I cannot emphasize this enough.
      And *of course* you should profile your code to make sure that you
      actually do have a performance problem before attacking efficiency
      concerns. OTOH, DSP-loops are always a bottleneck because someone is
      bound to instantiate 100 copies of your MSP external sooner or later.
      So, if you have a perform loop that looks like:
      ===========================
      int n = vectorSize,
      m = 0;
      float *inputVector,
      buf[kArraySize];
      // Initialize inputVector and, if necessary, buf.
      // then...
      while (n-- > 0) {
      m += 1;
      if (m >= kArraySize)
      m = 0;
      buf[m] = *inputVector++;
      // whatever other processing follows...
      }
      ===========================
      The code is less efficient than it could be because you have two
      branch instructions in your inner loop (the while condition is a
      branch, as is the if statement). Ideally you will never have more
      than a single branch in your inside loop, the loop condition.
      The solution is to unwrap something like the below:
      ===========================
      // Initialize as before, then:
      while (n > 0) { // Decrement inside loop
      int sampsThisTime = (n > kArraySize) ? kArraySize : n;
      n -= sampsThisTime;
      m = 0;
      while (sampsThisTime-- > 0) {
      buf[m++] = *inputVector++;
      // whatever other processing follows...
      }
      }
      ===========================
      The inner loop now only has a single branch instruction for the loop
      condition. You have branch overhead for the outer loop, but those
      branch instructions will, on average, only be executed once every
      kArraySize samples.
      Obviously not worth the effort if kArraySize is small but the
      assumption is that the buffer is large enough to justify the extra work.
      There are zillions of ways to further tweak the code (use an
      incremented pointer instead of buf[m], etc.) Left as an exercise for
      the reader.
      I hear lunch is on the table. The code above is untested and hasn't
      even been proof-read. Caveat lector.
      -------------- http://www.bek.no/~pcastine/Litter/ -------------
      Peter Castine +--> Litter Power & Litter Bundle for Jitter
      Universal Binaries on the way
      iCE: Sequencing, Recording &
      Interface Building for |home | chez nous|
      Max/MSP Extremely cool |bei uns | i nostri|
    • Sep 12 2006 | 7:38 pm
      Also, it just occurred to me that you should make very sure that your
      circular buffer is at least as large as the DSP vector. I know this
      is a silly and simple suggestion, but you might notice nasty clicks
      and other artifacts when you overwrite the starting point in your
      buffer for the current dsp call.
      One more thing ;)
      If you're using Peter's excellent suggestion to pre-compute the wrap
      value, and the element size of your circular buffer is the same as the
      element size of the dsp vector, you may as well use memcpy. You may
      see up to a 2x improvement with that method, as os x auto-vectorizes
      memcpy when possible.
      so the code would look like:
      while (n > 0) { // Decrement inside loop
      int sampsThisTime;
      size_t newCurBufPos;
      if( (n + curBufPos) => kArraySize) {
      // we're wrapping
      sampsThisTime = (kArraySize - curBufPos);
      newCurBufPos = 0;
      }
      else {
      // it all fits
      sampsThisTime = n;
      newCurBufPos += n;
      }
      memcpy(buf + curBufPos,
      inputVector + curInputPos,
      sampsThisTime * sizeof(*inputVector));
      n -= sampsThisTime;
      curInputPos += sampsThisTime;
      curBufPos = newCurBufPos;
      }
      This way we don't even have an internal loop (though memcpy does);
      I think that the code will only loop twice at most, if it loops more
      than that then you're losing data.
      I'm pretty sure this will work, but I ran no tests :)
      _Mark