Efficient Constant Q Transform


    Nov 21 2011 | 5:45 pm
    Is there any way to implement a Constant Q transform in MaxMSP? A CQT is like Fourier transform except its bins are logarithmically-spaced instead of linear, so that all octaves have equal resolution. Which makes way more sense for music analysis. http://en.wikipedia.org/wiki/Constant_Q_transform#Comparison_with_the_Fourier_Transform
    Here is a paper by Miller Puckette on an efficient algorithm which makes use of the FFT "An Efficient Algorithm for the Calculation of a Constant Q Transform" http://www.wellesley.edu/Physics/brown/pubs/effalgV92P2698-P2701.pdf
    Here's another paper on an efficient algorithm: "Constant-Q transform toolbox for music processing" http://www.elec.qmul.ac.uk/people/anssik/cqt/
    If there is already a Max object for this, or if this is a problem which has already been solved, please let me know. Otherwise, any help in pointing me in the right direction as to how this ought to be best implemented would be awesome.

    • Nov 21 2011 | 8:50 pm
      I don't know of an implementation of this transform in Max off the top of my head. There are some good clues in this paper:
      I haven't read it that closely, but I think it wouldn't be rocket surgery to port that matlab implementation into a max object written in C. Or gen~, perhaps.
    • Feb 14 2012 | 9:17 am
      I'm also very interested and will follow this thread. I'm totally aware that it's really over my capabilities both in maths and max and in my understanding the problem (for what I'd like to do) with Puckette's paper is that the process described is not inversible... but you can also have a look at this one : "Constructing an inversible constant-q transform with nonstationary Gabor frames" http://www.univie.ac.at/nonstatgab/pdf_files/dohogrve11_amsart.pdf
    • Mar 17 2014 | 1:19 am
      I know this is quite old, but before I try to wrangle this myself ( and it will be a dodgy-arse wrangle ) was there anyone that found an answer?
      I've been digging around to attempt a log spread fft freq output and much of the maths is quite over my head. After finding the "constant q transform" method, it seems like a great potential solution ( as opposed to simply reallocating linear bins into log bins as a post ftt process ).
      C
    • Jan 02 2015 | 7:09 pm
      The constant-q transform (as proposed by M.S.P. in the paper linked by cortexelus) has been implemented in Max as an eternal object. I have not yet verified myself that it works, and this thread is so old that the object may have been developed by one the early posters themselves, but here it is: http://grrrr.org/research/software/constantq/. I figured I should post here for any future viewers of this thread.
    • Jan 02 2015 | 10:55 pm
      Link?
    • Jan 11 2015 | 4:26 pm
      Okay, so the complied project is dependent on FFTW libraries, available here: http://www.fftw.org/index.html#documentation However, after installing these libraries (./configure, make, make install), [constantq~] is still referencing a non-existent file path, "usr/local/lib/libfftw3f.3.dylib" Note: I have zero knowledge of C and program development
      Also think I'll try and look at his source code, though it's dependent on a development later he built (flext) and another external library (blitz++)
    • Jan 26 2015 | 1:04 am
      After lots of trial and error, I've identified (and resolved) problems with installation, and my fftw installation is now compatible with [constantq~]. In the ./configure stage, some special flags are required. The installation must be compiled for 32-bit architecture, which is a special flag for the compiler, specified by the "CC..." portion of the command. Secondly, the default installation is a static library, where dynamic is required; this is fixed through "--enable-shared." Lastly, the installation has to have the "--enable-float" flag. The command which worked for me is as follows:
      ./configure CC="gcc -arch i386" CXX="g++ -arch i386" --enable-shared --enable-float
    • Sep 24 2015 | 6:22 pm
      You could also make your own implementation of a constant Q transform, by using the native msp object: fffb~
      - Make chromatically spaced filters, by using a couple of fffb~'s, depending on how many bands you need. (each object has a max filter number of 50) - read help file. Or use quarter-note spaced filters, like the transform described by Judith C. Brown, in her paper (Calculation of a constant Q spectral transform, 1990).
      - Then get the RMS/peak amplitude for each band, and collect it in a list. now you've got your own chroma-vector of size n...
      You could potentially vary the q setting for each filter, to get variable bandwidth/time resolution for each band.
      You may also adjust the gain of each filter in fffb~, this allows you to implement stuff like fletcher-munson - equal loudness curves, to more accurately describe the signals harmonic content, in a way that relates to human hearing.
      It might not be the least cpu intensive solution, but its flexible and easy to work with. I've managed to achieve decent polyphonic analysis on old win xp computers, with max/msp 4.5 :-)
      It might be worth a shot.
      - gus
    • Jun 30 2017 | 4:50 pm
      I assume no one on here has since implemented an efficient constant-q transform? I'm trying to wrap my head around the Puckette & Brown paper referenced above, in an attempt to get pfft~ to transform to a constant q. But I want to make sure no one has done this already, as this is only a very small part of a bigger project.
      The [constantq~] object referenced above doesn't seem like it will work for my purposes, because it outputs a list, as opposed to signal rate info. And I don't yet have C or C++ skills to modify it to my liking.
      And Gus, I suppose that's an option, but probably too inefficient compared to converting an fft into a constant q, as described in the Puckette & Brown paper.
    • Jul 02 2017 | 3:13 am
      If all you really want is to dimension-reduce the FT spectrum into something closer to audible perception, maybe a bark or Mel scale analysis can work. There's a sketchy start toward this here: https://cycling74.com/forums/gen-sharing-is-barking Essentially this is chunking geometrically increasing numbers of input FFT bins into output CQ bins. (Which is related to what the morlet wavelet transform does too) But unlike what the P&B paper is talking about, this sketchy approach doesn't allow you to chose the f0 or the Q ratio. The f0 is going to be the center frequency of the lowest FFT bin, and the ratio between CQ bins is going to be powers of 2.
      I'm not fully grasping the P&B paper, but it seems that given an f0 and Q ratio, one needs to precompute a matrix (a kernel) mapping a subset of the input FFT bins into each output CQ bin. I'd guess that this could also be done as a gen~-in-pfft~ thing, but would likely need some codeboxing.
      My other question is -- when you say you prefer the output to not be a list, but a signal, I'm not sure what this would actually look like. The number of CQ bins should be less than the FFT frame size, so I guess the CQ bins *could* be packed into an audio signal -- but how would you use them? What do you want to do with the analysis?
    • Jul 02 2017 | 8:37 am
      FWIW, analyzer~ will output bark bands. Also, zsa.mel~ and zsa.bark~
    • Jul 08 2017 | 6:50 pm
      Graham: Actually, I don't mind if f0 is automatically the lowest FFT bin. However, I believe that this won't work for what I need either, unfortunately.
      The problem is that, after getting logarithmically spaced bins, I need to do some processing, and then convert back to linearly spaced bins. This needs to be as close to lossless as possible... and I believe that chunking several bins to get that spacing will end up losing data that cannot be retrieved after further processing.
      And maybe the above will answer your question about list vs. signal. This is to be further processed, as opposed to merely viewed as analysis. Ideally, the signal would still look akin to the output out fftin~, carrying a real part per bin, an imaginary part per bin, and the current bin index.
      mzed: unfortunately, those won't work for me, as they still output a list.
    • Jul 11 2017 | 2:38 pm
      About getting a close-to-lossless reversible transformation. There's a time/space tradeoff that corresponds to the spectral layout -- we have more high frequency bins than low frequency bins because there's more cycles per unit time at high frequencies, i.e. more information. Working in a log-spaced transform, if it is executed in fixed-sized chunks (which is true if it is derived from a Fourier transform), means discarding a lot of information, which means non-lossless reconstruction. A way around this would is to analyze different bands at different window sizes, where high frequency bin analyses run more frequently than the low frequency bin analyses. A wavelet-based analysis can do this.
      Years ago I got interested in this problem and after reading the Computer Music Tutorial wanted to learn about wavelets. The easiest thing to do was a time-domain Haar wavelet analyzer & synthesizer, which was able to losslessly reconstruct the analyzed signal. The intermediate format has typically 10-13 bins (depends on how low frequency you want to go), log spaced. But the information is preserved: a higher frequency bin has more samples per window than a lower bin, so overall the same amount of information is present. Actually it could be fun to knock this up as a gen~ example, the process is interesting. But frankly most of the processes you apply in the intermediate domain result in some noisy modulation artifacts. Which isn't surprising considering the simplicity of the Haar wavelet.
      Fancier wavelet and dictionary-based analyses can do better compression, but are much harder to write, and might not be easier to operate on in their transformed domains.
    • Jul 11 2017 | 5:44 pm
      Here's that Haar wavelet thing in gen~:
    • Jul 11 2017 | 8:18 pm
      @Graham: thanks for that. Could any arbitrary wavelet potentially be used for this? I wonder if these distortions could be aided from different wavelet types. I messed around with wavelet~ from CNMAT too, but it's great that your implementation here provides frame and bin indexes.
      FWIW, I actually got my logarithmic resampling to work! The problem was that I was trying to resample real and imaginary parts. Now, I'm only processing the magnitude with a logarithmic read function, then to get back to linear spacing, reading back from the logarithmically processed spacings exponentially. Then, this whole time phase was unprocessed, and is delayed to be matched with the re-read, re-linearized magnitudes.
      Going back to the time domain, it sounds MUCH better than doing this with separate real and imaginary parts, but there is still some noise, esp. in the high frequencies. This problem is greatly helped by placing the pfft~ patch within a poly~ and upsampling by a factor of 4 or 8. That way, the bins that are most inaccurate by interpolating are irrelevant to the final output.
    • Nov 28 2018 | 3:12 pm
      Waking this up again: I tried finding a compiled, up-to-date version of the constant-Q transform external. Does anyone know of an updated version? The version from grrrr yields: "constantq~: unable to load object bundle executable"
    • Feb 10 2019 | 9:22 pm
      Bump. I am also interested in... got my attention after watching this video especially the spectrum shown at 15:20 https://www.youtube.com/watch?v=ONJVJBmFiuE