Denormals

Jan 30, 2009 at 8:49pm

Denormals

Hi all,

I was wondering if anyone would like to share their solutions for dealing with denormals. It’s recently come up as a problem in CNMAT’s smooth-biquad~:

http://www.cycling74.com/forums/index.php?t=msg&th=33661&start=0&rid=4586&S=2551c966f7c03e3c1250963adb02068f

and it seems like the same problem was in cascade~:

http://www.cycling74.com/forums/index.php?t=msg&th=34447&start=0&rid=4586&S=2551c966f7c03e3c1250963adb02068f

We fixed the problem in smooth-biquad~ by doing this:

#ifdef WINDOWS
#include
#else
#include
#pragma STDC FENV_ACCESS ON
#endif

t_int *biquad2_perform(t_int *w){

#ifdef WINDOWS
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON)
#else
fenv_t oldEnv;
//Read the old environment and set the new environment using default flags and denormals off
fegetenv( &oldEnv );
fesetenv( FE_DFL_DISABLE_SSE_DENORMS_ENV );
#endif

// do inner loop calculations here

#ifdef WINDOWS
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_OFF)
#else
fesetenv( &oldEnv );
#endif


}

This solution seems to work well, but the object is a little more expensive–about 1.2x the number of cpu cycles. I suspect the performance hit is due to the compiler translating the code to SSE instructions and that we will need to tune our code for SSE.

If anyone else would be willing to share the ways in which they’re dealing with this problem, I’d love to hear them and would be happy to benchmark them against the code above.

Thanks in advance,
JM

#42019
Jan 30, 2009 at 10:53pm

John –

This might have some useful information, possibly a bit dated though:

http://www.musicdsp.org/files/other001.txt

brad

http://music.columbia.edu/~brad

On Jan 30, 2009, at 3:49 PM, John MacCallum wrote:

>
> Hi all,
>
> I was wondering if anyone would like to share their solutions for
> dealing with denormals. It’s recently come up as a problem in
> CNMAT’s smooth-biquad~:
>
> http://www.cycling74.com/forums/index.php?t=msg&th=33661&start=0&rid=4586&S=2551c966f7c03e3c1250963adb02068f
>
> and it seems like the same problem was in cascade~:
>
> http://www.cycling74.com/forums/index.php?t=msg&th=34447&start=0&rid=4586&S=2551c966f7c03e3c1250963adb02068f
>
> We fixed the problem in smooth-biquad~ by doing this:
>
> #ifdef WINDOWS
> #include
> #else
> #include
> #pragma STDC FENV_ACCESS ON
> #endif
>
> …
>
> t_int *biquad2_perform(t_int *w){
>
> #ifdef WINDOWS
> _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON)
> #else
> fenv_t oldEnv;
> //Read the old environment and set the new environment using
> default flags and denormals off
> fegetenv( &oldEnv );
> fesetenv( FE_DFL_DISABLE_SSE_DENORMS_ENV );
> #endif
>
> // do inner loop calculations here
>
> #ifdef WINDOWS
> _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_OFF)
> #else
> fesetenv( &oldEnv );
> #endif
>
> …
> }
>
> This solution seems to work well, but the object is a little more
> expensive–about 1.2x the number of cpu cycles. I suspect the
> performance hit is due to the compiler translating the code to SSE
> instructions and that we will need to tune our code for SSE.
>
> If anyone else would be willing to share the ways in which they’re
> dealing with this problem, I’d love to hear them and would be happy
> to benchmark them against the code above.
>
> Thanks in advance,
> JM
>

#150087
Jan 31, 2009 at 12:47am

Thanks a lot Brad–that’s a good resource. Anyone using anything other than some variant of one of these techniques?

JM

#150088
Jan 31, 2009 at 1:10am

On 31 janv. 09, at 01:47, John MacCallum wrote:

> Thanks a lot Brad–that’s a good resource. Anyone using anything
> other than some variant of one of these techniques?

Have also a look to this thread. Graham posted a link to an
interesting article, as well as showing the standard macros.

http://www.cycling74.com/forums/index.php?t=tree&th=36065&mid=154847&rid=0&S=a4b487a0345094eeccd60bcf0479a709&rev=&reveal=

Cheers,
ej

#150089
Jan 31, 2009 at 4:01am

I’ve been using the “flipping number solution” from the link that Brad provided (alternatively known as square injection) in my [gverb~] object and others for a while. Works like a charm. Here it is in pseudo-code.

–Nathan

>>>

// fix for denormal through square injection of dc offset
// define value during preprocessing for easy updates
#define TINY_DC 0.0000000000000000000000001f

// maintain dc_offset for square injection in object struct
double sqinject_val;

// initialize square injection value in object “new” method
x->sqinject_val = TINY_DC;

// in “perform” method before while loop
// flip sign on square inhection for each block
sqinject_val = x->sqinject_val * -1.0;

// in “perform” method in while loop

val_dry = *in_dry; // grab input values
val_dry += sqinject_val; // add very small dc offset

>>>

#150090
Jan 31, 2009 at 7:49am

For info: SuperCollider uses an inline function something like:

inline float zap(float x) throw()
{
float absx = std::abs(x);
return (absx > (float)1e-15 && absx < (float)1e15) ? x : (float)0.;
}

.. which gets rid of denorms and other nasties.

#150091
Feb 1, 2009 at 11:48am

> This solution seems to work well, but the object is a little more expensive–about 1.2x the number of cpu cycles. I suspect the performance hit is due to the compiler translating the code to SSE instructions and that we will need to tune our code for SSE.

From my experience this seems an unlikely reason for the code to run slower on Mac OS at least – as most floating point code will most likely generate (non-vectorised) SSE code anyway on Max OS X – which you can see by examining the assembly generated in shark or something. Here’s a relevant quote from the apple sse/altivec document:

“The scalar-on-vector feature is used by MacOS X on Intel to do most scalar floating point arithmetic.
So, if you write a normal floating point expression, such as float a = 2.0f; that will be done on
XMM. (For compiler illuminati, the GCC compiler flag, -mfpmath=sse, is on by default.) Single and double precision scalar floating point arithmetic is done on the SSE unit both for speed and also so
as to deliver computational results much more like those obtained from PowerPC. The legacy x87
scalar floating point unit is still used for long double, because of its enhanced precision. “

So I’d imagine the cost you’re setting is actually the cost of changing the floating point environment on such a frequent basis.

If you KNOW that your code is generating SSE instructions (or I suppose if you are using SSE intrinsiccs) there is what I believe might be a more lightweight way to turn denormal flushing on for the SSE unit – here’s the code I’m using – in my case for sse vector code (so I know it’s SSE instructions I’m generating):

// Set MXCSR bits

#if defined( __i386__ ) || defined( __x86_64__ )
int oldMXCSR = _mm_getcsr(); // read the old MXCSR setting
int newMXCSR = oldMXCSR | 0×8040; // set DAZ and FZ bits
_mm_setcsr( newMXCSR ); // write the new MXCSR setting to the MXCSR
#endif

/// Main loop processing here….

// Reset MXCSR bits

#if defined( __i386__ ) || defined( __x86_64__ )
_mm_setcsr(oldMXCSR);
#endif

I remember trying a few things (I think including setting the floating point environment), and this was the fastest for what I wanted to do. It would faster still not to have to set the bits every signal vector, but this unfortunately is necessary.

As far as other methods are concerned:

Branching in loops is always slow, so selectively flushing will be slower than adding noise of any kind. Adding noise/dc/”flipped numbers/ square wave” may well be negligible in terms of cpu (you often get very small ops “for free”, because the bottleneck in your code is writing to/from memory, rather than the actual operations), – it’s up to you whether you mind adding noise to the filter or not.

There is a slightly different noise algorithm used in 2up.svf~ (code here http://2uptech.com/objects.html) that I used to generate noise to feed to the standard svf~ object to fix a denormal issue I was having. With a filter you may be able to add noise at only the input/one stage to avoid denormals, depending on the filter and the magnitude of the noise – in the svf case there shaping calculation that takes the 4th power of the signal which causes most of the denormals.

Regards,

Alex

#150092

You must be logged in to reply to this topic.