lores~ code : understanding unroll and related functions

nbrc's icon

Hello everyone.

I am very new to programming Max externals.
To practice, I am making a Moog VCF-like filter. I have based my external on the SDK lores~ example.
My external works well, but, I really want to understand all functionalities that I've implemented.

I don't understand well the relationships between SMOOTHING_VERSION, lores_perform_unroll_smooth64, and maxvectorsize. Can someone explain this to me?

void lores_dsp64(t_lores *x, t_object *dsp64, short *count, double samplerate, long maxvectorsize, long flags){
    x->l_2pidsr = (2.0 * PI) / samplerate;
    lores_calc(x);
    x->l_a1p = x->l_a1;  // store prev coefs
    x->l_a2p = x->l_a2;
    x->l_fcon = count[1];    // signal connected to the frequency inlet?
    x->l_rcon = count[2];    // signal connected to the resonance inlet?
    lores_clear(x);

    if (maxvectorsize >= 4) {
#if SMOOTHING_VERSION
        dsp_add64(dsp64, (t_object *)x,(t_perfroutine64)lores_perform_unroll_smooth64, 0, NULL);
#else
        dsp_add64(dsp64, (t_object *)x, (t_perfroutine64)lores_perform_unroll64, 0, NULL);
#endif
    }
    else
        dsp_add64(dsp64, (t_object *)x, (t_perfroutine64)lores_perform64, 0, NULL);
}

Basically, why do we need an unroll function?

Isabel Kaspriskie's icon

Hi!

Regarding smoothing: I don't see any definition of lores_perform_unroll_smooth64, and it looks commented out (via the #define SMOOTHING_VERSION 0). Perhaps it is a historical artifact?

Regarding unrolling: "loop unrolling" can sometimes(*) yield performance improvements.

What is different in the unroll method? It calculates a value `n` which is 1/4 of the vector size. Then inside the main loop, it loops over `n` instead of the vector size. Within that loop, it does four calculations at once instead of one.

while (n--) {
    *out++ = yna = scale * (val = *in++) - a1 * ynb - a2 * yna;
    *out++ = ynb = scale * (val = *in++) - a1 * yna - a2 * ynb;
    *out++ = yna = scale * (val = *in++) - a1 * ynb - a2 * yna;
    *out++ = ynb = scale * (val = *in++) - a1 * yna - a2 * ynb;
}

vs. the non-unroll method:

while (sampleframes--) {
    val = *in++;
    temp = ym1;
    ym1 = scale * val - a1 * ym1 - a2 * ym2;
    ym2 = temp;
    *out++ = ym1;
}

(*) It can depend on a lot of factors on whether unrolling does improve things, how much, etc. You can read more here: https://en.wikipedia.org/wiki/Loop_unrolling . There is also the caveat of premature optimization -- be careful not to just add loop unrolling everywhere before you know you need to (by checking with benchmarks and profiling tools). :)

nbrc's icon

Hi Isabel,

Thank you so much for your explanation.
I have a clear idea of what is unrolling.

For smoothing, yes, it seems to be a historical artifact.

Thank you very much also for the references. I will use loop unrolling carefully. :)