why do i need 'trunc~' ?

    Jan 24 2010 | 6:30 pm
    working on some pfft~ stuff.
    why do i need 'trunc~' ? so many examples i've seen use it, but is not:
    'round~ 1. @nearest 0'
    ...exactly the same and far less cpu?
    i am presuming i am wrong and someone can enlighten me, hence this post...

    • Jan 24 2010 | 7:03 pm
      I just did a quick test with 2000 [trunc~ ] then 2000 [round~ 1. @nearest 0 ]:
      2000 trunc~ : approx 14% cpu 2000 round~ : approx 17% cpu
      May not be a reliable test, but that's what I got. You can also do the same with [%~ 1 } and [-~ ], but that obviously uses more cpu than a trunc~ object.
      (sorry - not the most helpful reply I know....)
    • Jan 24 2010 | 7:11 pm
      IT's not that unusual for there to be more than one way to do the same thing in MaxMSP - often objects have their own histories and new features may have been added at any time.
      CPU usage is quite likely to be platform dependent - my guess (purely speculative) would be that if you are on windows trunc~ may be more expensive - I'd be surprised if trunc~ is slower on the mac platform, but as always YMMV.
    • Jan 25 2010 | 12:59 am
      thanks raja, and all, excellent. i guess my presumptions about the objects in question were ill founded. i will use trunc~. no doubt too many times because of my inane patching style. isn't it interesting how faster computers have made inbetweeners such as myself a lot more lazy and a lot less savvy with regards cpu and patching efficiencies...
    • Jan 25 2010 | 1:13 am
      @raja - yes that is almost certainly how trunc~ works (there are some other clues in the way the object functions) - but due to machine architectures / instructions it may not be as simple as you think - you only have to view the very recent thread on an optimised windows version of trunc~ to see that cast float to int under windows is not necessarily that fast, due to switching rounding modes on the FPU (which is quite slow).
      It is also clear from testing round~ that it does not rely on casting at all (or at least not to a 32 bit int - you can confirm this by sending a large float into either object that is beyond the range of a 32 bit int - round~ behaves correctly - trunc~ doesn't). This means that it may well be faster because it probably avoids switching fpu rounding modes.
      Under mac on intel casting compiles as an SSE instruction, as SSE2 is a requirement of the OS and so the compiler by default uses SSE instructions for many floating point ops.
      The upshot of all this is:
      trunc~ - probably faster on a mac (intel anyway), but only works when the input is within the range of a 32 bit int.
      round~ - gives correct results for a wider range of inputs (although arguably that doesn't matter because the precision of a 32 bit float beyond the range of a 24bit unsigned integer is already greater than one, and thus most algorithms requiring accurate results to within 1 integer will probably fail under this condition anyway) - possibly faster under windows due to fpu switching for a cast operation.
      However - in the end the only way to know which is faster on your machine for sure is to benchmark.
    • Jan 25 2010 | 10:21 am
      interesting. so, presumably, just using a few of these objects does not really cause concern with regards speed + efficiency cross-platform. however, in some hypothetical situation where you needed lots of trunc~s on a mactel patch, and wanted true cross platform compatibility, you would re-write with round~s in place for win? i am wondering if this is all nitpicking or really a concern? i am wondering if the latter, where a good piece of documentation might be documenting all msp objects in this way?
    • Jan 25 2010 | 3:59 pm
      FWIW, looking at the disassembly of the trunc~ external on the Mac, there are no SSE instructions present.
    • Jan 26 2010 | 12:02 am
      Hmmm- I'm not clear on the exact correct terminology here, but according to wikipedia (you can see the link here - http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions ) the instructions I believe we are talking about are question are SSE instructions...
      It seems to be a little known fact that most floating point maths under macintel is not done on the FPU, but rather on the vector unit. SSE instructions do not only cover packed data, but scalar data as well. I'm not implying that trunc~ is vectorised (I don't believe it to be) - but I do think it makes use of the SSE instruction set. This is my understanding of the situation - if you know differently I'd happily be corrected if I've got this wrong. To make things totally explicit these are the instructions I believe are in question
      cvttss2si cvtsi2ss
    • Jan 26 2010 | 2:51 am
      You are right! What I meant to say was that except for those two instructions...
      ;-) Tim
    • Jan 26 2010 | 8:54 am
      Anyhow, the statement that trunc~ is computationally very expensive seems to stem from the PPC era. Back then I would use [bitor~ 0 1] to do truncation, which gave a huge gain in terms of cpu. Now, this appears not to be the case anymore.
      Doing some measurement with patch below, I get these differences (1024 polyphony):
      +~: 34% bitor: 36% trunc: 34% round: 37%
      Truncating is as expensive as adding. I believe this is a non issue.
      _ johan
    • Jan 26 2010 | 11:19 am
      johan, your post is the most helpful to my original question yet (and ironically similar to timlloyd's post!). and thanks, this is what i wanted to hear. (maybe it is from my ppc days that the unfair 'trunc~' expense statement arose?). it has been a fun thread to instantiate and read though (!), so thanks all...