why do i need 'trunc~' ?

pid's icon

hi.

working on some pfft~ stuff.

why do i need 'trunc~' ? so many examples i've seen use it, but is not:

'round~ 1. @nearest 0'

...exactly the same and far less cpu?

Max Patch
Copy patch and select New From Clipboard in Max.

i am presuming i am wrong and someone can enlighten me, hence this post...

Tim Lloyd's icon

I just did a quick test with 2000 [trunc~ ] then 2000 [round~ 1. @nearest 0 ]:

2000 trunc~ : approx 14% cpu
2000 round~ : approx 17% cpu

May not be a reliable test, but that's what I got. You can also do the same with [%~ 1 } and [-~ ], but that obviously uses more cpu than a trunc~ object.

(sorry - not the most helpful reply I know....)

AlexHarker's icon

IT's not that unusual for there to be more than one way to do the same thing in MaxMSP - often objects have their own histories and new features may have been added at any time.

CPU usage is quite likely to be platform dependent - my guess (purely speculative) would be that if you are on windows trunc~ may be more expensive - I'd be surprised if trunc~ is slower on the mac platform, but as always YMMV.

A.

pid's icon

thanks raja, and all, excellent. i guess my presumptions about the objects in question were ill founded. i will use trunc~. no doubt too many times because of my inane patching style. isn't it interesting how faster computers have made inbetweeners such as myself a lot more lazy and a lot less savvy with regards cpu and patching efficiencies...

AlexHarker's icon

@raja - yes that is almost certainly how trunc~ works (there are some other clues in the way the object functions) - but due to machine architectures / instructions it may not be as simple as you think - you only have to view the very recent thread on an optimised windows version of trunc~ to see that cast float to int under windows is not necessarily that fast, due to switching rounding modes on the FPU (which is quite slow).

It is also clear from testing round~ that it does not rely on casting at all (or at least not to a 32 bit int - you can confirm this by sending a large float into either object that is beyond the range of a 32 bit int - round~ behaves correctly - trunc~ doesn't). This means that it may well be faster because it probably avoids switching fpu rounding modes.

Under mac on intel casting compiles as an SSE instruction, as SSE2 is a requirement of the OS and so the compiler by default uses SSE instructions for many floating point ops.

The upshot of all this is:

trunc~ - probably faster on a mac (intel anyway), but only works when the input is within the range of a 32 bit int.

round~ - gives correct results for a wider range of inputs (although arguably that doesn't matter because the precision of a 32 bit float beyond the range of a 24bit unsigned integer is already greater than one, and thus most algorithms requiring accurate results to within 1 integer will probably fail under this condition anyway) - possibly faster under windows due to fpu switching for a cast operation.

However - in the end the only way to know which is faster on your machine for sure is to benchmark.

pid's icon

interesting. so, presumably, just using a few of these objects does not really cause concern with regards speed + efficiency cross-platform. however, in some hypothetical situation where you needed lots of trunc~s on a mactel patch, and wanted true cross platform compatibility, you would re-write with round~s in place for win? i am wondering if this is all nitpicking or really a concern? i am wondering if the latter, where a good piece of documentation might be documenting all msp objects in this way?

Timothy Place's icon

FWIW, looking at the disassembly of the trunc~ external on the Mac, there are no SSE instructions present.

AlexHarker's icon

Hmmm- I'm not clear on the exact correct terminology here, but according to wikipedia (you can see the link here - http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions ) the instructions I believe we are talking about are question are SSE instructions...

It seems to be a little known fact that most floating point maths under macintel is not done on the FPU, but rather on the vector unit. SSE instructions do not only cover packed data, but scalar data as well. I'm not implying that trunc~ is vectorised (I don't believe it to be) - but I do think it makes use of the SSE instruction set. This is my understanding of the situation - if you know differently I'd happily be corrected if I've got this wrong. To make things totally explicit these are the instructions I believe are in question

cvttss2si
cvtsi2ss

A.

Timothy Place's icon

You are right! What I meant to say was that except for those two instructions...

;-)
Tim

jvkr's icon

Anyhow, the statement that trunc~ is computationally very expensive seems to stem from the PPC era. Back then I would use [bitor~ 0 1] to do truncation, which gave a huge gain in terms of cpu. Now, this appears not to be the case anymore.

Doing some measurement with patch below, I get these differences (1024 polyphony):

+~: 34%
bitor: 36%
trunc: 34%
round: 37%

Truncating is as expensive as adding. I believe this is a non issue.

Max Patch
Copy patch and select New From Clipboard in Max.

_
johan

pid's icon

johan, your post is the most helpful to my original question yet (and ironically similar to timlloyd's post!). and thanks, this is what i wanted to hear. (maybe it is from my ppc days that the unfair 'trunc~' expense statement arose?). it has been a fun thread to instantiate and read though (!), so thanks all...