If there is already a Max object for this, or if this is a problem which has already been solved, please let me know. Otherwise, any help in pointing me in the right direction as to how this ought to be best implemented would be awesome.
I'm also very interested and will follow this thread. I'm totally aware that it's really over my capabilities both in maths and max and in my understanding the problem (for what I'd like to do) with Puckette's paper is that the process described is not inversible... but you can also have a look at this one : "Constructing an inversible constant-q transform with nonstationary Gabor frames" http://www.univie.ac.at/nonstatgab/pdf_files/dohogrve11_amsart.pdf
I know this is quite old, but before I try to wrangle this myself ( and it will be a dodgy-arse wrangle ) was there anyone that found an answer?
I've been digging around to attempt a log spread fft freq output and much of the maths is quite over my head. After finding the "constant q transform" method, it seems like a great potential solution ( as opposed to simply reallocating linear bins into log bins as a post ftt process ).
The constant-q transform (as proposed by M.S.P. in the paper linked by cortexelus) has been implemented in Max as an eternal object. I have not yet verified myself that it works, and this thread is so old that the object may have been developed by one the early posters themselves, but here it is:
http://grrrr.org/research/software/constantq/. I figured I should post here for any future viewers of this thread.
Okay, so the complied project is dependent on FFTW libraries, available here: http://www.fftw.org/index.html#documentation
However, after installing these libraries (./configure, make, make install), [constantq~] is still referencing a non-existent file path, "usr/local/lib/libfftw3f.3.dylib"
Note: I have zero knowledge of C and program development
Also think I'll try and look at his source code, though it's dependent on a development later he built (flext) and another external library (blitz++)
After lots of trial and error, I've identified (and resolved) problems with installation, and my fftw installation is now compatible with [constantq~]. In the ./configure stage, some special flags are required. The installation must be compiled for 32-bit architecture, which is a special flag for the compiler, specified by the "CC..." portion of the command. Secondly, the default installation is a static library, where dynamic is required; this is fixed through "--enable-shared." Lastly, the installation has to have the "--enable-float" flag. The command which worked for me is as follows:
You could also make your own implementation of a constant Q transform, by using the native msp object:
- Make chromatically spaced filters, by using a couple of fffb~'s, depending on how many bands you need.
(each object has a max filter number of 50) - read help file.
Or use quarter-note spaced filters, like the transform described by Judith C. Brown, in her paper
(Calculation of a constant Q spectral transform, 1990).
- Then get the RMS/peak amplitude for each band, and collect it in a list.
now you've got your own chroma-vector of size n...
You could potentially vary the q setting for each filter, to get variable bandwidth/time resolution for each band.
You may also adjust the gain of each filter in fffb~, this allows you to implement stuff like fletcher-munson - equal loudness curves,
to more accurately describe the signals harmonic content, in a way that relates to human hearing.
It might not be the least cpu intensive solution, but its flexible and easy to work with.
I've managed to achieve decent polyphonic analysis on old win xp computers, with max/msp 4.5 :-)
I assume no one on here has since implemented an efficient constant-q transform? I'm trying to wrap my head around the Puckette & Brown paper referenced above, in an attempt to get pfft~ to transform to a constant q. But I want to make sure no one has done this already, as this is only a very small part of a bigger project.
The [constantq~] object referenced above doesn't seem like it will work for my purposes, because it outputs a list, as opposed to signal rate info. And I don't yet have C or C++ skills to modify it to my liking.
And Gus, I suppose that's an option, but probably too inefficient compared to converting an fft into a constant q, as described in the Puckette & Brown paper.
If all you really want is to dimension-reduce the FT spectrum into something closer to audible perception, maybe a bark or Mel scale analysis can work. There's a sketchy start toward this here: https://cycling74.com/forums/gen-sharing-is-barking Essentially this is chunking geometrically increasing numbers of input FFT bins into output CQ bins. (Which is related to what the morlet wavelet transform does too) But unlike what the P&B paper is talking about, this sketchy approach doesn't allow you to chose the f0 or the Q ratio. The f0 is going to be the center frequency of the lowest FFT bin, and the ratio between CQ bins is going to be powers of 2.
I'm not fully grasping the P&B paper, but it seems that given an f0 and Q ratio, one needs to precompute a matrix (a kernel) mapping a subset of the input FFT bins into each output CQ bin. I'd guess that this could also be done as a gen~-in-pfft~ thing, but would likely need some codeboxing.
My other question is -- when you say you prefer the output to not be a list, but a signal, I'm not sure what this would actually look like. The number of CQ bins should be less than the FFT frame size, so I guess the CQ bins *could* be packed into an audio signal -- but how would you use them? What do you want to do with the analysis?
Graham: Actually, I don't mind if f0 is automatically the lowest FFT bin. However, I believe that this won't work for what I need either, unfortunately.
The problem is that, after getting logarithmically spaced bins, I need to do some processing, and then convert back to linearly spaced bins. This needs to be as close to lossless as possible... and I believe that chunking several bins to get that spacing will end up losing data that cannot be retrieved after further processing.
And maybe the above will answer your question about list vs. signal. This is to be further processed, as opposed to merely viewed as analysis. Ideally, the signal would still look akin to the output out fftin~, carrying a real part per bin, an imaginary part per bin, and the current bin index.
mzed: unfortunately, those won't work for me, as they still output a list.
About getting a close-to-lossless reversible transformation. There's a time/space tradeoff that corresponds to the spectral layout -- we have more high frequency bins than low frequency bins because there's more cycles per unit time at high frequencies, i.e. more information. Working in a log-spaced transform, if it is executed in fixed-sized chunks (which is true if it is derived from a Fourier transform), means discarding a lot of information, which means non-lossless reconstruction. A way around this would is to analyze different bands at different window sizes, where high frequency bin analyses run more frequently than the low frequency bin analyses. A wavelet-based analysis can do this.
Years ago I got interested in this problem and after reading the Computer Music Tutorial wanted to learn about wavelets. The easiest thing to do was a time-domain Haar wavelet analyzer & synthesizer, which was able to losslessly reconstruct the analyzed signal. The intermediate format has typically 10-13 bins (depends on how low frequency you want to go), log spaced. But the information is preserved: a higher frequency bin has more samples per window than a lower bin, so overall the same amount of information is present. Actually it could be fun to knock this up as a gen~ example, the process is interesting. But frankly most of the processes you apply in the intermediate domain result in some noisy modulation artifacts. Which isn't surprising considering the simplicity of the Haar wavelet.
Fancier wavelet and dictionary-based analyses can do better compression, but are much harder to write, and might not be easier to operate on in their transformed domains.
@Graham: thanks for that. Could any arbitrary wavelet potentially be used for this? I wonder if these distortions could be aided from different wavelet types.
I messed around with wavelet~ from CNMAT too, but it's great that your implementation here provides frame and bin indexes.
FWIW, I actually got my logarithmic resampling to work! The problem was that I was trying to resample real and imaginary parts. Now, I'm only processing the magnitude with a logarithmic read function, then to get back to linear spacing, reading back from the logarithmically processed spacings exponentially. Then, this whole time phase was unprocessed, and is delayed to be matched with the re-read, re-linearized magnitudes.
Going back to the time domain, it sounds MUCH better than doing this with separate real and imaginary parts, but there is still some noise, esp. in the high frequencies. This problem is greatly helped by placing the pfft~ patch within a poly~ and upsampling by a factor of 4 or 8. That way, the bins that are most inaccurate by interpolating are irrelevant to the final output.
Waking this up again: I tried finding a compiled, up-to-date version of the constant-Q transform external. Does anyone know of an updated version?
The version from grrrr yields: "constantq~: unable to load object bundle executable"