ensemble resynthesis?

jbm

I've had this thought a couple of times before, so I thought I'd post it, to see what others think.

Since it's possible to get a half-decent resynthesis of a violin, say, using additive techniques. And it's possible to get a half-decent resynthesis of a clarinet, using additivie techniques. Shouldn't it be possible to get a half-decent resynthesis of music for violin and clarinet using additive techniques? After all, the synthesis is ultimately creating something that we hear -- it doesn't care whether there are one, two, or twenty instruments required to play it. So my thought was to combine the violin and clarinet (as an example) *before* they reach the additive synth, so to speak, in order to utilize a *single* synth with a large number of sines, with as little duplication of effort as possible. This seems reasonable to me, as there is always a great deal of masking going on anytime we hear two instruments playing together... The control data from different instruments would be interpolated, on the fly, and sent to the synth as a hybrid "instrument", expressing the spectral content of the two (or more) sources together in preformance.

Is this just silly?

LoneMonad aka don malone

the violin waveform is "close" to a sawtooth
the clarinet is "close" to a triangle
but if you add the two together
it is not as close to a violin and clarinet

you could with additive synthesis
come up with a single steady state waveform
that is similar (but less so than individually)
but i am not sure how much more economic it would be
maybe a little
but i don't think it would be significant

there are problems
1)the onset behaviors are different
but this behavior changes with range
and in dissimilar ways
2) the tuning of overtones are different
which also change dissimilarly with range
3) the formants are different
4) the playing characteristics are different
vibrato, transition?

jbm

Sorry, I realize I wasn't too clear... I would be resynthesizing samples using an fft analysis/additive synthesis approach. My thinking is that, if one could record a violin and clarinet playing together, analyze it (fft), and resynthesize it additively, then why not take analyses of each, combine those, and resynthesize the combination. Does that make any more sense? All I'm thinking is that, instead of having two 200-sine synths, each playing one instrument (and remember, I'm talking about additively resynthesizing samples -- so there's no hugely complex envelope to construct, just changing parameters over time, based on the anaysis data), why not merge the resynthesis data *before* the 200 sines, and have the synth play the merged data?

Brad Garton

On Thu, 25 Jan 2007, jbmaxwell wrote:

> Since it's possible to get a half-decent resynthesis of a violin, say,
> using additive techniques. And it's possible to get a half-decent
> resynthesis of a clarinet, using additivie techniques. Shouldn't it be
> possible to get a half-decent resynthesis of music for violin and
> clarinet using additive techniques?

Not a silly idea, and in fact it has already been done, but I doubt it is
the kind of synthesis technique you had imagined: an oscillator-bank
resynthesis of a moving FFT analysis is essentially an 'additive
synthesis' recreation of an acoustic event, so a resynthesis based on
analyzed (or constructed?) FFT data should be able to create an "ensemble
sound".

However, the thing that would make this sound synthetic -- as is also the
case with resynthed or sampled single notes of particular instruments when
used in a musical context -- is how much our ears/brains attend to the
tiny, and not necessarily random, acoustic differences that occur in time.
Sometimes this isn't a big factor for what you want to do, though.

brad
http://music.columbia.edu/~brad

jbm

Quote: Bradford Garton wrote on Thu, 25 January 2007 14:02
----------------------------------------------------
>[...] so a resynthesis based on
> analyzed (or constructed?) FFT data should be able to create an "ensemble
> sound".
>

Yes, that's exactly what I'm after.

> However, the thing that would make this sound synthetic -- as is also the
> case with resynthed or sampled single notes of particular instruments when
> used in a musical context -- is how much our ears/brains attend to the
> tiny, and not necessarily random, acoustic differences that occur in time.
> Sometimes this isn't a big factor for what you want to do, though.

Well, it's just a thought, at this point, and may be too much sweat for what I'm intending. That said, it wouldn't be a final, "production" output I'd be after, but rather a relatively cheap "working copy", so the quality can be less-than-ideal. You mentioned that this has been done before (which, of course, doesn't surprise me): are there any examples you know of?

Anything in MaxMSP?

thanks,

Brad Garton

On Thu, 25 Jan 2007, jbmaxwell wrote:

> Quote: Bradford Garton wrote on Thu, 25 January 2007 14:02
> ----------------------------------------------------
>> [...] so a resynthesis based on
>> analyzed (or constructed?) FFT data should be able to create an "ensemble
>> sound".
>>
>
> Yes, that's exactly what I'm after.

Yeah -- saw your clarification come in right after I hit 'send'. Shoulda
known! :-)

> You mentioned that this has been done
> before (which, of course, doesn't surprise me): are there any examples
> you know of?

Not specifically. It was more a generic "FFT resynth" has been done
before.

brad
http://music.columbia.edu/~brad

jbm

> Not specifically. It was more a generic "FFT resynth" has been done
> before.
>
> brad
> http://music.columbia.edu/~brad
>
>

Right. I may poke around with it a bit. I suppose one big question is exactly how one deals with "collisions" -- that is, attempts by different "instruments" to control the same oscillator? Do you think simple addition would work? Or should it be an average of some sort... or maybe addition with some sort of logarithmic scaling?

Eric Lyon

If you are going for a realistic acoustic instrumental sound, you will probably get much better results using time domain samples from a high quality sample bank, since relatively small modifications of FFT data will generally give you artifacts that will be quite noticeable in this context. Googling for "orchestral simulation" and related topics covers the time domain approach pretty well.

If you wish to try the physical model approach (Paul Lansky has gotten great results with this IMO) then you will not be able to superpose different instruments, as there is considerable non-linearity in those synthesis models.

HtH,

Eric

Quote: jbm wrote on Thu, 25 January 2007 13:53
----------------------------------------------------
> Sorry, I realize I wasn't too clear... I would be resynthesizing samples using an fft analysis/additive synthesis approach. My thinking is that, if one could record a violin and clarinet playing together, analyze it (fft), and resynthesize it additively, then why not take analyses of each, combine those, and resynthesize the combination. Does that make any more sense? All I'm thinking is that, instead of having two 200-sine synths, each playing one instrument (and remember, I'm talking about additively resynthesizing samples -- so there's no hugely complex envelope to construct, just changing parameters over time, based on the anaysis data), why not merge the resynthesis data *before* the 200 sines, and have the synth play the merged data?
>
> J.
----------------------------------------------------

jbm

Quote: Eric Lyon wrote on Thu, 25 January 2007 15:18
----------------------------------------------------
> If you are going for a realistic acoustic instrumental sound, you will probably get much better results using time domain samples from a high quality sample bank, since relatively small modifications of FFT data will generally give you artifacts that will be quite noticeable in this context. Googling for "orchestral simulation" and related topics covers the time domain approach pretty well.
>

Forgive me for sounding goofy, but do you mean standard audio samples -- i.e., "sampling" -- or are your referring to some different form of analysis? I've been a VSL-user (and we are "users" in the most "street" sense of the word, when you consider the $$$ most of us have blown) for a few years. Obviously, they sound fantastic. But the overhead is mind-blowing, and it can really drag the composition process to a stand-still, at times. So, I'm looking to analyze the contents of my libraries, and resynthesize them in a way that's good enough for the composition process -- particularly for the interactive, realtime composition software I've been working on -- and that doesn't require 3+ machines, ruinning 8+ GB or RAM. After that, I can always render a version with the VSL stuff, for final output.

> If you wish to try the physical model approach (Paul Lansky has gotten great results with this IMO) then you will not be able to superpose different instruments, as there is considerable non-linearity in those synthesis models.
>

Yeah, I've looked into phymod before... my brain melted.

jbm

> [...] ruinning 8+ GB or RAM.

that was a nice typo! Can you say "slip"?

Eric Lyon

>
> Forgive me for sounding goofy,
> but do you mean standard audio > samples -- i.e., "sampling"

Yes.

>>
So, I'm looking to analyze the contents of my libraries, and resynthesize them in a way that's good enough for the composition process -- particularly for the interactive, realtime composition software I've been working on -- and that doesn't require 3+ machines, ruinning 8+ GB or RAM. After that, I can always render a version with the VSL stuff, for final output.
>>

That would depend on what "good enough" is for your needs. Most commercial music notation programs provide "good enough" orchestral synthesis, i.e. sounds lousy but gives you a rough idea of the score. You could import a MIDI file to one of these programs. Since you're thinking in terms of a sketch to later be rendered, perhaps you could even live with the hurtful sounds of the DLS synth, in which case you could stay within Max.

Please report back on what solution works best for you.

Eric

jbm

>
> That would depend on what "good enough" is for your needs. Most > commercial music notation programs provide "good enough"
> orchestral synthesis, i.e. sounds lousy but gives you a rough
> idea of the score. You could import a MIDI file to one of these > programs. Since you're thinking in terms of a sketch to later be rendered, perhaps you could even live with the hurtful sounds of the DLS synth, in which case you could stay within Max.
>

If I could settle on a reasonable version of "good enough" I'd be a whole lot more sane than I am today... Mind you, sometimes I just like trying stuff in MaxMSP -- it's like what Stravinsky referred to as "speculative volition" (he was talking about composing, of course). I'm more or less speculating... and maybe I'll even speculate a little in code! ;-)

>
> Please report back on what solution works best for you.
>
> Eric
>

Well, just over a year ago I actually made a few versions of apps that could analyze musical data, either on a 1-second delay (like Synful Orchestra), or from a midi file, and choose samples from the drive for each note. They even worked! To a degree... but there was always some problem popping up. Some bottleneck or integration problem (like Logic's funky time stamps) Eventually, I "upgraded" to some Vienna Instruments, and let the project go. However, as brilliant as the VIs are, they still require a *lot* of energy to use well, and a boat-load of hardware to run. So, I'm sort of pondering options again. Even back when I was making the above-mentioned tools I had wondered about resynthesis approaches. Synful is a company who has approached the problem (range of expression: sound quality: computer resources) in an interesting way, but I don't think it's quite got it, for me. Also Melodyne is doing pretty convincing resynthesis. I mean, I realize this is all heavy-duty, patented stuff. But the idea's there...

As I said, I'm really just thinking out loud, at this point.

mzed

Quote: jbm wrote on Thu, 25 January 2007 08:35
----------------------------------------------------
>...Even back when I was making the above-mentioned tools I had wondered about resynthesis approaches. Synful is a company who has approached the problem (range of expression: sound quality: computer resources) in an interesting way, but I don't think it's quite got it, for me. Also Melodyne is doing pretty convincing resynthesis. I mean, I realize this is all heavy-duty, patented stuff. But the idea's there...
----------------------------------------------------

I don't think you're really talking about *re*synthesis, which presupposes an analysis phase that is being synthesized. I mean, if you send me an SDIF file within an analysis of the ensemble phrase you want to hear (probably not a problem with SPEAR), I wouldn't have much trouble making a reasonable resynthesis using CNMAT Objects.

But, I think you want synthesis. Synful drives an additive synthesizer using realtime concatinative methods on a phrase database. Diemo Schwarz PhD thesis is a good read on the subject:

http://recherche.ircam.fr/equipes/analyse-synthese/schwarz/

Given enough partials, additive synthesis can create whatever sound you're looking for. After raw computing power, the problem has always been with control structures. This is where the brainpower comes in.

mzed

jbm

>
> I don't think you're really talking about *re*synthesis, which presupposes an analysis phase that is being synthesized. I mean, if you send me an SDIF file within an analysis of the ensemble phrase you want to hear (probably not a problem with SPEAR), I wouldn't have much trouble making a reasonable resynthesis using CNMAT Objects.
>

I think you missed me at some point. I'm literally talking about resynthesizing the samples from a commercial library. You mention analyzing an ensemble and resynthesizing using the CNMAT objects. Now, imagine you somehow could extract each instrumental part out of the SDIF file. Then you would have x number of SDIF files, each containing the data necessary to resynthesize the instrument. Of course, I'm talking about working the other way... Or another way of looking at it: could four (or whatever number) SDIF files be merged or interpolated in some manner that would produce the sound of those four files playing together. I'm just curious as to whether this might incur a smaller performance hit, since a single "synth" would be playing the whole ensemble. Additive is additive, as you point out above. Ultimately, it's just sines producing the illusion of a coherent sonic image.

> But, I think you want synthesis. Synful drives an additive synthesizer using realtime concatinative methods on a phrase database. Diemo Schwarz PhD thesis is a good read on the subject:
>
Yeah. It is a synthesis engine. However, Synful is still sample-based, in that the original phrase database comes from audio recordings (in fact, last I heard, they're in a big process of re-recording many of the original samples to try to pull up the audio quality). But my point was simply that, since the original samples can provide all the complexities of how the sound develops over time, it's a lot less complex than trying to synthesize acoustic instruments from scratch. The performance element is already there.

> http://recherche.ircam.fr/equipes/analyse-synthese/schwarz/
>

thanks. I'll check this out.

> Given enough partials, additive synthesis can create whatever sound you're looking for. After raw computing power, the problem has always been with control structures. This is where the brainpower comes in.
>

Well, as I say above, the sounds would be resynthesized from a sample library. The only difference with what I'm thinking of is that I'm talking about doing the synthesis step *after* the composition step, in a sense. So you could say I'm *synthesizing* the ensemble, as a combination of *re*synthesized instruments. Make sense? Or perhaps more accurately, the sample library is the control structure for each instrument, and the score is another control structure, regulating the influence of each instrument on the final synthesized ensemble...

But my real problem is how to bring down the data required for the synthesis/resynthesis. SDIF files are basically the same size as .wav files. What I'd like is to reduce the data size altogether. I think, maybe, with a combination of additive and subtractive synthesis this might be possible... I've also tossed around ideas with granular, or windowed, approaches -- maybe finding some way of generalizing the spectral content in a manner that allows data to be reused in ways not possible with samples, which are basically "frozen" in time. So a big part of my original idea was to reduce the data flow from audio rate to control rate. I want a "cheap" version of SDIF, if such a thing exists.

Roman Thilenius

Quote: jbm wrote on Thu, 25 January 2007 00:05
----------------------------------------------------
> I've had this thought a couple of times before, so I thought I'd post it, to see what others think.
>
> Since it's possible to get a half-decent resynthesis of a violin, say, using additive techniques. And it's possible to get a half-decent resynthesis of a clarinet, using additivie techniques. Shouldn't it be possible to get a half-decent resynthesis of music for violin and clarinet using additive techniques? After all, the synthesis is ultimately creating something that we hear -- it doesn't care whether there are one, two, or twenty instruments required to play it. So my thought was to combine the violin and clarinet (as an example) *before* they reach the additive synth, so to speak, in order to utilize a *single* synth with a large number of sines, with as little duplication of effort as possible. This seems reasonable to me, as there is always a great deal of masking going on anytime we hear two instruments playing together... The control data from different instruments would be interpolated, on the fly, and sent to the synth as a hybrid "instrument", expressing the spectral content of the two (or more) sources together in preformance.
>
> Is this just silly?
>
> J.

no its not silly - but maybe it is also not the ultimative
solution for future music production.

it is theoretically possible to combine (i.e. multiply or add)
the data which is used for synthesis or resynthesis (be it
max, metasynth, kyma, melodyne, or an analog modelled oscillator
C++ code) BEOFRE making them a digital signal.

right now i just dont see the advantage over summing
different audio channels?

in an offline enviroment clever optimisation of a 100 voice
synth could save some time, but otoh in such a system you
could not prelisten to the individual tracks ...

jbm

> no its not silly - but maybe it is also not the ultimative
> solution for future music production.
>

lol! If I thought I'd found the solution for future music production, I wouldn't be blabbing about it on a public forum. I'd be whispering about it to a lawyer, under a highly restrictive NDA. This is a highly personal journey I've been on for the past few years, trying to tweak my composition environment. That's all.

>
> it is theoretically possible to combine (i.e. multiply or add)
> the data which is used for synthesis or resynthesis (be it
> max, metasynth, kyma, melodyne, or an analog modelled oscillator
> C++ code) BEOFRE making them a digital signal.
>
> right now i just dont see the advantage over summing
> different audio channels?
>
>
> in an offline enviroment clever optimisation of a 100 voice
> synth could save some time, but otoh in such a system you
> could not prelisten to the individual tracks ...
>

I'm not sure why you couldn't prelisten to an individual track. The "mix" is just a mix of control data, rather than audio data. That's all. "Soloing" a single part would just be filtering out all the control data from the other instruments. Same deal, different sequence of events.

Stefan Tiedje

jbmaxwell wrote:
> The control data from different instruments would be interpolated, on
> the fly, and sent to the synth as a hybrid "instrument", expressing
> the spec! tral content of the two (or more) sources together in
> preformance.
>
> Is this just silly?

It would sound as if they play in PERFECT unison.

The common morphing methods are doing exactly that by the way...

You can also just analyze a mix of the two, which would preserve the
unperfectness - sounds probably richer...

Stefan

--
Stefan Tiedje------------x-------
--_____-----------|--------------
--(_|_ ----|-----|-----()-------
-- _|_)----|-----()--------------
----------()--------www.ccmix.com

jbm

>
> It would sound as if they play in PERFECT unison.
>

huh? No, they would be different musical parts.

> The common morphing methods are doing exactly that by the way...
>

It's still a different idea. The control data is morphed, but the musical parts are different. It would be like analysing a recording of two instrumental parts, and resynthsizing that, only the two parts are each expressed as control/analysis data, then "morphed".

A) is a line for clarinet, B) is a different line for violin. A) is analyzed, and B) is analyzed. Instead of resynthesizing A) and B) with two synthesis engines, combine the data of the two (as though you'd analyzed a performance of them playing together) and resynthesize that. Simple, and quite possibly a total waste of time.

> You can also just analyze a mix of the two, which would preserve the
> unperfectness - sounds probably richer...
>

That's sort of the idea, but each sample that makes up either A) or B) is an independently analyzed sample. I don't know... it's a bit of a nightmare, really, since you'd also have to try to manage things like crossfades between samples, which would probably be a nightmare to do with analysis data, and a no-brainer with actual audio.

Brad Garton

On Fri, 26 Jan 2007, jbmaxwell wrote:

> It's still a different idea. The control data is morphed, but the
> musical parts are different. It would be like analysing a recording of
> two instrumental parts, and resynthsizing that, only the two parts are
> each expressed as control/analysis data, then "morphed".

One thing, I think that addition in the frequency domain is the same as
addition in the time domain, unless you mess things up a bit.

brad
http://music.columbia.edu/~brad

mzed

Quote: jbm wrote on Thu, 25 January 2007 16:12
----------------------------------------------------
>
> >
> > I don't think you're really talking about *re*synthesis, which presupposes an analysis phase that is being synthesized. I mean, if you send me an SDIF file within an analysis of the ensemble phrase you want to hear (probably not a problem with SPEAR), I wouldn't have much trouble making a reasonable resynthesis using CNMAT Objects.
> >
>
> I think you missed me at some point.

I think I missed this:
>

> But my real problem is how to bring down the data required for the synthesis/resynthesis. SDIF files are basically the same size as .wav files. What I'd like is to reduce the data size altogether.

I was not thinking in terms of a data reduction problem. It is true that SDIF files are often even *bigger* than the .wav file -- because they contain more information. You can certainly merge SDIF files using some of the CNMAT command line utilities (you'll have to trust me on this, because our website is down). But your question about removing unnecessary partials is the interesting one.

This seems like Doctoral thesis level work to me. One trick would be to apply some psychoacoustic thinking, and remove partials that would be masked by other partials. But assuming two partials that are nearly the same frequency are redundant would be unwise. The analysis of monophonic instruments I've seen often have this characteristic -- this little beating and roughness really adds life to a timbre. This would be even more so with an ensemble. (This effect is important enough that we've worked on adding clusters of partials to timbral models).

I'll poke around the literature, but my feeling is that nobody is really tackling intelligent data reduction in this field at the moment. (It's still hard to get a good analysis.) I have used SPEAR to remove partials that are below a certian amplitude, and that helps, but looking a model (or models) and removing partials based on their contribution to the overall sound is way beyond that.