analysis to additive resynthesis~

jbm

I can't seem to find quite the info I'm looking for... What is the best way to analyze musical (i.e., pitched) audio, save it as a file, and then use it for additive resynthesis? I want to build a resynthesis engine for my sampler patch which will do two things: 1) ease the RAM load by thinning the sample data down to control data
2) make it possible to mess around with the spectrum of the sound.

I'm thinking of recording fft analysis data into a downsampled poly~/buffer~, and using that to drive some resynthesis patch -- maybe using oscbank~, or reson~, of perhaps fffb~...? But what I can't seem to find, though it's probably simple, is how to get freq, mag(amp), and phase out of an fft analysis, which seems to just be amp/phase. I'm certainly missing something, but I can't find any info on additive resynthesis from analysis, other than ifft~ or fft~ out. Any clarification greatly appreciated.

Anthony Palomba

I have been thinking of doing something like this as well.
One possible way is to use some analysis program like
SPEAR http://www.klingbeil.com/spear/
It will save all the partial information in a text
file that you can then load in MAX. You can create
a bank of oscillators and initialize them to the
loaded data. The SPEAR file output may need to be
massaged a bit to get it in a useful format.

Anthony

jbm

hehe... we're obviously in the same "zone"... I've been messing around with Spear for the last hour, or so!
I could be on crack, but I seem to remember hearing there was a Max implementation of Spear at some point... no? Or was it a command-line version? Dunno.

I've just taken a closer look at a text export from Spear, and it does look workable... If anybody knows the actual format of the data Spear exports in a text file, please let us know! It does seem pretty simply organized, it's just a matter of figuring out how it's laid out.

I'll be in touch.

jbm

Okay... Looks like each line is:

float: frame number
int: partial count in frame

followed by lists of 3 numbers for each partial:

int: partial number(?), float: frequency, float: amplitude

maybe?

Anyway, if this is right it should be fairly easy to build a parser
in java. I'll give it a go. But if anybody has any info on the question at the top of this post, please let me know. I'd rather not have to convert umpteen thousand interleaved files to L/R sdif files... you know, that's kind of a bummer. I guess it could be automated, but...

What I'd rather do is find a genuine control-rate solution (small files), and use some combination of additive/subtractive to do the resynthesis. This could also give some funky options for tweaking, since you could have sine parts and also noise parts to mess with (or rather, "with which to mess").

Stefan Tiedje

jbmaxwell wrote:
> but I can't find any info on additive resynthesis from analysis,
> other than ifft~ or fft~ out. Any clarification greatly appreciated.

definitely look into pfft~ and the pvoc examples in the examples folder...

The there is sdif look at the CNMAT site and read all papers... pretty
advanced stuff. You can mangle sdif data with the FTM library from Ircam
then (advanced as well....;-)

Do not expect to get less data though... if you want to keep the
quality, its likely you need even more, which is no problem nowadays...

Stefan

--
Stefan Tiedje------------x-------
--_____-----------|--------------
--(_|_ ----|-----|-----()-------
-- _|_)----|-----()--------------
----------()--------www.ccmix.com

roger.carruthers

I think sdif may be your friend; I've loaded Spear files into Max using sdif
objects, but I can't remember or find the details right now.
I'll try to be a bit more specific when I find my brain again,
Cheers
Roger

jln

Someone maked a spear~ external for some course if I remember well. I
can't remember who, though. I think you'll find it thru the forum
archives. Not sure it was stable at that time.

hth,

Julien.

Brad Garton

Michael and I coded it up for a class last year:

http://music.columbia.edu/cmc/courses/g6610/week13/index.htm l

WARNING! Although it works, it's really not ready for prime-time. I did
all sorts of bad stuff in the interests of keeping the code as streamlined
as possible (pedagogical clarity etc.). It's great fun to play with,
though. Somebody could probably make it into a halfway decent external
without too much trouble.

brad
http://music.columbia.edu/~brad

Anthony Palomba

I was hoping someone would post these
again. Good stuff! Thanks again.

Anthony

jbm

Aha!

So either:

a) I wasn't on crack,

b) crack has no negative effects on my memory!

Very cool, I'll check it out.

As far as what I'm after goes, it must be possible to reduce the data footprint, since there are programs that do it (I'm thinking of Melodyne... yeah, I know, it's extremely advanced, ultra-top-secret code. But still, it does make astonishingly small files which sound extremely convincing, so the data in the file must only be control data for the synthesis engine). If I stick with control parameters to a synthesis engine (as opposed to fftout~ or ifft~) then the synthesis process will hopefully take care of some of that extra sonic info which we normally need all those pesky little samples to realize (a lot of which is noise). This is why I'm thinking of a combination of additive + noise. And actually, just taking 20 bands from analyzer~, and sending the partials info to herd of little cycle~ objects was really not that bad -- popping and clicking, and kind of crappy, but it was at least promising.

I'll look into the spear~ stuff. BTW, Brad, what is the formatting of the Spear text files? Was my guess above vaguely correct?

jbm

After more research, and some pestering, it turns out that Melodyne doesn't actually capture all of the necessary info in its nice little files... In fact, from what I've been able to find out, the analysis files are in addition to the original audio file. So, it's probably some form of fft/ifft after all. (Yes Stefan, I've eaten my words).
Now I'm going to bed. But I am still curious about the spearsynth~ approach, since the text files output by Spear are, in most cases, about half the size of a normal .wav file. Brad, have you ever thought about cleaning up the makeshift coding you refer to on your download page?

Brad Garton

On Tue, 16 May 2006, jbmaxwell wrote:

> I'll look into the spear~ stuff. BTW, Brad, what is the formatting of
> the Spear text files? Was my guess above vaguely correct?

Yes -- you pretty much figured it out. I forgot to send this link also:

http://music.columbia.edu/cmc/courses/g6610/week11/index.htm l

which contains some really basic standalone C programs for doing
operations on the SPEAR data. Michael set it (SPEAR) up to write data in
several formats, including SDIF, but the two we used in class were
text-formats, one containing data for each tracked partial as a separate
line, and one that was time-sliced into 'resampled' frames. The text
files have a header at the top telling the format along with some other
info (# of partials, etc.). The page above has some examples of the data
files along with the simple programs.

brad
http://music.columbia.edu/~brad

Brad Garton

On Wed, 17 May 2006, jbmaxwell wrote:

> Now I'm going to bed. But I am still curious about the spearsynth~
> approach, since the text files output by Spear are, in most cases, about
> half the size of a normal .wav file.

Plus the advantage that SPEAR data gives is the partial-tracking, rather
tricky to do.

> Brad, have you ever thought about
> cleaning up the makeshift coding you refer to on your download page?

I probably won't, but one of our students has expressed interest in
working on an extended set of SPEAR data max/msp objects over the summer,
ultimate goal being kind of an auto-spectral-composition thing. We'll
see...

brad
http://music.columbia.edu/~brad

Roman Thilenius

> What is the best way to analyze musical (i.e., pitched) audio, >save it as a file, and then use it for additive resynthesis? I >want to build a resynthesis engine for my sampler patch which >will do two things: 1) ease the RAM load by thinning the sample >data down to control data

http://celemony.de/ ...

> 2) make it possible to mess around with the spectrum of the sound.
>
> I'm thinking of recording fft analysis data into a downsampled >poly~/buffer~, and using that to drive some resynthesis patch >-- maybe using oscbank~, or reson~, of perhaps fffb~...?

IMO for a good quality result that needs far
too much CPU.
replaying partial tone fragments with fft-made
data is a bad idea anyway.
the metallic sound of metasynth comes to mind. :)

>But what I can't seem to find, though it's probably simple, is >how to get freq, mag(amp), and phase out of an fft analysis, >which seems to just be amp/phase.

frequency = time ...

-110

jbm

> >how to get freq, mag(amp), and phase out of an fft analysis, >which seems to just be amp/phase.
>
> frequency = time ...
>
>
> -110

Well, from what I understand, it's the frequency slices (0 to sampling rate) that encode the frequency in the fft... Anyway, I'm going to mess around with analyzer~ again. I didn't really try it with fffb~, which I think might be worth a go. Otherwise I may mess around a bit with the spearsynth~ and Spear output stuff, but I'm not going to spend much more time on it. Plenty of brains much more tightly packed than mine have been at this problem. If it were easy, it would be wrapped-up in its own msp~ object by now!

[ edit ] ...but I suppose there's always the granular approach. gulp! Goodbye life.

Roman Thilenius

> After more research, and some pestering, it turns out that >Melodyne doesn't actually capture all of the necessary info in >its nice little files...

maybe i was a bit unclear (by just poting an url ;P )
about "melodyne".
it could be an alternative for using it _instead of
custom code in max to create what you had on your head.

>In fact, from what I've been able to find out, the analysis >files are in addition to the original audio file.

>well yes and no ... they is what is played after analysis.
>but this is a propetiary format only of course!

>So, it's probably some form of fft/ifft after all.

it is , however, not exciting combifilters when it
is replayed. :)
you can play 24 tracks of melodyne files on a G4-400
while modulating the data in realtime ...

Roman Thilenius

> > frequency = time ...
> >
> >
> > -110
>
> Well, from what I understand, it's the frequency slices (0 to > sampling rate) that encode the frequency in the fft...

110 does not speak teh fft language well enough
to explain it in this language.

but you must think like that:

the audio is not neccessarily a sine, it can also be
polyphonic, or a complex timbre, or a madonna recording.
so why do you exspect that you can get "frequency" from
fast fourier?

"frequency" or "timbre" or "spectrum" in any kind of
analysis for resynthesis is first of all a matter of
gain
(where is the loudest partial now? where is the second
loudest partial now? does it remain at the same frequency
during the next analysis vector or does it change?)

i forgot about the spear object because i am
on 4.1 max.
that will be the best solution since there is no
10,000- voices version of fiddle~

f.e

dear Brad,

i rermember i tried to compiled it for win, without success, and we
didn't know why. Any clues today ?

f.e

f.e chanfrault | aka | personal computer music
> >>>>>> http://www.personal-computer-music.com
> >>>>>> |sublime music for a desperate people|

jbm

>
> maybe i was a bit unclear (by just poting an url ;P )
> about "melodyne".
> it could be an alternative for using it _instead of
> custom code in max to create what you had on your head.
>

Yes, I've messed about with Melodyne in the past, and was very impressed. But the format is not what I'm after, in terms of it being an audio editor. I know their technology is behind the "liquid" instruments as well. That's closer, but still not what I'm after. Another company that's obviously done something similar is the company that makes Synful Orchestra. It's pretty synthy sounding dry, but with a little reverb is not bad. So, what I'm thinking of is somewhere between Melodyne and Synful. Like Synful in the compactness of the final "instruments" (though obviously nowhere near as compact as those), but like Melodyne in the capacity to load whatever audio material I want.

Anyway, I played around with analyzer~ -> fffb~, which was useless. Using just a bunch of cycle~s is sort of close-ish, as is oscbank~ (both from analyzer as well). The problem is that both of these are pretty "garbled" sounding. I guess this is due to the wandering of the pitch between "frames" of the analyzer~ output. I tried using smoother, which helps during the "steady state", but gives a funky glide at the start and end of the sample (kind of cool, actually!). I'm becoming convinced that spearsynth~ will be the best bet.

jbm

> so why do you exspect that you can get "frequency" from
> fast fourier?
>

Well, I don't. Which is why I've generally been steering
clear of fft.

> "frequency" or "timbre" or "spectrum" in any kind of
> analysis for resynthesis is first of all a matter of
> gain
> (where is the loudest partial now? where is the second
> loudest partial now? does it remain at the same frequency
> during the next analysis vector or does it change?)
>

Yeah, and these issues seem to be the big problem with resynthesizing directly from analyzer~ (which is how I'm experimenting with it). However, it may be that formatting the output from analyzer~ into some sort of file, and using that file for the resynthesis might be easier. Don't know. But it might make it easier to determine which oscillator would handle which partial. ? Just thinking out loud.

[ edit ] It seems like this is a fairly hot topic every time it comes up (long threads in the past, as well). I wonder if c74 are considering a native implementation of sdif or some form of general spear-ish-ness in the future??? That would be great! SDIF is tied up with IRCAM, though, isn't it? Well, nice to dream...

Roman Thilenius

>Don't know. But it might make it easier to determine which >oscillator would handle which partial. ? Just thinking out loud.

hm AFAIremember the corresponding kyma sound that
uses free indices rather than "from low to high,
prolly because that makes it possible that envelopes
(duration, so to say )of the partials can overlap
each other even when (all-1) are in use.

jbm

does anyone happen to have a compiled spearsynth object?

(I've never compiled a maxmsp external, and don't have CodeWarrior)

If so, maybe just email me directly:

jbmaxwell@btinternet.com (yeah, it's my username...)

thanks in advance,

Stefan Tiedje

jbmaxwell wrote:
> Anyway, I'm going to mess around with analyzer~ again. I didn't
> really try it with fffb~, which I think might be worth a go.

I am still wondering if someone would put the analyzer~ code into a
pfft~ form. I am dealing with a similar problem and would like to have
the analyzer~ information, but want to avoid to calculate the fft
several times (once in analyzer~ and once inside a pfft~)
I haven't found any sources for the fiddle~/bonk~ stuff, though they
should exist, as Millers licenses usually are open source.
I am not into c coding either...

The fffb~ route could be one to go as well, as its claimed that a fft is
much more effective, and could be described as a bunch of filters. But
for dealing with real world ears the resolution in the high range is way
too high and in the low range way too low, wheras filters can be setup
with a usefull resolution for all ranges and it will need much less
filters than a comparable fft has bins.
The advantage of fft is its clear mathematics and reproducable effects.
(though not always intuitively understandable...)

Stefan

--
Stefan Tiedje------------x-------
--_____-----------|--------------
--(_|_ ----|-----|-----()-------
-- _|_)----|-----()--------------
----------()--------www.ccmix.com

jbm

wheras filters can be setup
> with a usefull resolution for all ranges and it will need much less
> filters than a comparable fft has bins.

This is what I was thinking with fffb~. I also hoped that this approach (to belabour my original point) would mean working at control rate, and could thus be fed by a much smaller data set than ifft~ (not to mention making it very straighforward to manipulate the control data).
My very hasty attempt at making an fffb~ version was pretty grim, though. But I think this approach could be fine-tuned if the analyzer~ output were saved to a file, then formatted in a way more useful to fffb~. I had thought to record the analyzer~ output to a buffer~ in a downsampled poly~, then use the buffer~ to playback the fffb~ data (and also to save the analysis file!). The thing I'm not sure about is the rate at which analyzer~ outputs its analysis... but I'm sure that's in the help file/args. I just didn't really check it out yet.
I'm on to another project for now, but if you have any luck please let me know.

sfogar

Hi all,

did you try the Gabor library by Ircam ?

It's part (free) of the FTM didtribution.

http://recherche.ircam.fr/equipes/temps-reel/ftm/gabor.html

All the best

Alessandro Fogar

jbm

Well, I've looked at it... I've played around with FTM on a couple of occasions, but both times it's led to numerous crashes of Max. That makes me pretty nervous. I'd imagine it's something I did, but it's hard to commit to an object/external when you're unsure of its stability (says someone who's entirely willing to play with spearsynth~!!!). I'll look at it again; maybe if I'm only using it for this one function it will be more friendly.

thanks,

sfogar

Hi,

> Well, I've looked at it... I've played around with FTM on a couple of occasions, but both times it's led to numerous crashes of Max. That makes me pretty nervous. I'd imagine it's something I did, but it's hard to commit to an object/external when you're unsure of its stability (says someone who's entirely willing to play with spearsynth~!!!). I'll look at it again; maybe if I'm only using it for this one function it will be more friendly.

I think, assuming that the crashes are still there, as you said, that
what we can do is to help them to debug their work so that we can get
a better library.

The guys at Ircam are working hard on FTM + etc.

All the best

Alessandro Fogar

Stefan Tiedje

jbmaxwell wrote:
> My very hasty attempt at making an fffb~ version was
> pretty grim, though.

So where mine, I just put up a semitone analyzer, which eats up almost
all available fffb~ bands. It would be necessary to combine several of
those to get enough resolution. Increasing the Q would also make them
pretty slow (as expected) Somekind of inteligent overlapping would be
necessary.... Maybe in the end I end up with ordered ffts (combining
several bins/ or downsampling and using shorter frame sizes... lots of
ideas but no diggin' yet....

Stefan

--
Stefan Tiedje------------x-------
--_____-----------|--------------
--(_|_ ----|-----|-----()-------
-- _|_)----|-----()--------------
----------()--------www.ccmix.com

jbm

> I think, assuming that the crashes are still there, as you said, that
> what we can do is to help them to debug their work so that we can get
> a better library.
>
> The guys at Ircam are working hard on FTM + etc.
>

Good point... I'll take another look at it. The last time I poked around I was hoping there would be a way to write sdif files from input. Maybe there is now, not sure, but there wasn't at that time. I'd like to be able to work from text files, like Spear exports, but maybe the FTM stuff can do that. I'll check it out.

cheers,

Roman Thilenius

> This is what I was thinking with fffb~.

good morning,

do not underestimate the phase smear fffb causes.
to get somewhat good frequency analysis results from
filters of say 40,48,60 Hz is not much easier than
using fft frames output.

imagine all the signal connections a 200 band ffb
array needs ... fft more or less only doubles the
signal data amount.

volker böhm

>>> how to get freq, mag(amp), and phase out of an fft analysis,
>>> >which seems to just be amp/phase.
>>
>> frequency = time ...
>>
>
> Well, from what I understand, it's the frequency slices (0 to
> sampling rate) that encode the frequency in the fft...

yes, and the "true frequency" is encoded in the phase differences.

from real and imag output of the fft you calculate magnitude an phase.
next you would have to do some kind of peak search on the magnitude
data in your current fft-frame to find the local maxima, i.e. the
bin numbers that are closest to the partials of the input signal. (a
partial which is not lined up perfectly with the frequency of an fft
bin will spill its energy in the neighoring bins - also refered to as
"leakage").
from these "potentialy-correct-but-not-quite-right" magnitude bins
you can calculate the true frequencies by examining the phase-deltas
of consecutive fft-frames (framedelta~ will help you with that).

there are also algorithms to calculate the true frequency by
examining the neighboring magnitude bins.
http://www.dspguru.com/howto/tech/peakfft.htm

hope that helps. i'm not saying it's easy to do, and you might as
well be better of, using some other software for this analysis
(spear, audiosculpt or whatever). if you still want to try it and
need some more help, contact me off list. i might be able to hack up
a little example illustrating the above.
bye,
volker.

mzed

These are in beta form at the moment. Have look and see if there's anything helpful in there.

http://homepage.mac.com/mikezed/CNMAT_Spectral_Tutorials.zip

mzed

jbm

Just dl'ed them, and looking forward to digging in -- tomorrow. Thanks for posting!

jbm

Okay, this tutorial and _these _objects _are _bloody _brilliant!!!

thank you, thank you... (I just got to the list-interpolate one, and my head nearly exploded! This is just the sort of thing I've been wishing I could find.)

and... thanks again (did I say that already?)

f.e

Dear michael,

what is usable for win users in this package ?

best

f.e

f.e chanfrault | aka | personal computer music
> >>>>>> http://www.personal-computer-music.com
> >>>>>> |sublime music for a desperate people|

jbm

This may be a silly question, as it relates to my general anxiety about RAM usage (which I'm sure many do not understand, but which is _very real, in my case -- and not simply because I don't have enough RAM), but does an sdif file load into an sdif buffer at full size? Or does it (read: "can it") be compressed in memory? It seems to me that it must basically contain text, and could perhaps be compressed in memory, without a great deal of overhead...
Could a "compact" option be added to the sdif buffer?

Anyway, just curious about that.

mzed

Quote: f.e wrote on Fri, 19 May 2006 01:36
----------------------------------------------------
> Dear michael,
>
> what is usable for win users in this package ?
>
er, um, yeah.

The objects in the archive are all Mac. But you can find some windows ones here:

http://www.cnmat.berkeley.edu/MAX/

It's on my list to compile them all for windows, but I'm behind at the moment.

f.e

I already know this page, for sure.
And i can give you a help, if you want, to compile the stuff (if it
doesn't mix too much C & C++).

cheers

f.e

f.e chanfrault | aka | personal computer music
> >>>>>> http://www.personal-computer-music.com
> >>>>>> |sublime music for a desperate people|