Is SDIF the best approach to resynthesis and time stretch of partials?

Baek Santarek's icon

Hi,

I want to resynthesize realistic samples in a highest possible fidelity with the ability to manipulate individual synthesized partials in real-time, including individual time-stretching.

I have been looking into SDIF for a couple of days and the more I experiment with it the more I ask myself if there is a more elegant and effective way of how to achieve my goal. I have very little experience and I want to avoid unnecessary dead-end work as much as possible.

My issue with SDIF is that it seems it would be clunky, complicated and computationally expensive to manipulate individual partials' time due to SDIF's usage of frames, especially when the partials span multiple frames.

On the other hand, exporting from SPEAR in the txt breakpoint format provides much more direct access to the partials' time values but you have to manipulate much more data than with the standard SDIF format. I worry that if I go the breakpoint-format way I will bump into performance issues due to the sheer amount of data with larger samples.

At the end, ideally I would love to have multisliders accessing 1024 partials' time, frequency and amplitude values, changing all parameters within a single 64 vector size.

All advice and tips would be greatly appreciated.

Exit Only's icon

I'm not an expert on this by any means but both Kyma and the new Madrona Labs Sumu synth do resynthesis using Loris-based oscillators which incorporate noisiness in order to mimic sounds using fewer partials. Sumu uses 64 partials. Here is a link with some info-

Spear can export SDIF RBEP format which includes the noisiness information. The sinusoids~ object in the CNMAT externals package has a "bandwidth enhanced" (BWE) mode which can use that information for resynthesis. If you go that route you'll need to configure SDIF-tuples and threefates object (both in the CNMAT package) to use a 4 column matrix instead of three.

That might be a good starting point to see how well it can sound, though sinusoids~ is a bit limited since it takes in the data at control-rate. You can take it further by dumping that information into a jitter matrix and then trying to control a custom additive synth in gen~ etc. There is some info about RBEP here, and more on the web
https://github.com/CNMAT/CNMAT-Externs
In particular looking at the source for the BWE oscillator in sinusoids is a good place to start for your own implementation in gen~ if that's what you want to do

My personal opinion is that trying to control 1024 partials sounds like a lot and often after 60-100 partials you have diminishing returns because the amplitude of those partials is so low in acoustic sounds. Using a loris style oscillator will compensate for this by adding some of that noisiness and changing it over time so you get the effect of more partials than you actually have.

If you experiment you can also have fun using FM against the partials to create more wild sounds with way more partials than you'll probably ever want.

Baek Santarek's icon

Thank you very much for your insight, Exit! Will check all of it. Very good point about many partials being a noise component only.

Roman Thilenius's icon

somebody said about the method used in the kyma snd in 1992 that its maximum of 1024 partials is pretty enough to record and replay things and a listener can´t distinguish it from a good 16 bit recording.

Baek Santarek's icon

Interesting. I always feel like the results are very dependent on the particular sound. Some sounds are resynthesized very well, while others are quite far from their originals no matter the settings (experiments in SPEAR).

Is there more info about Kyma's method or is it a secret sauce?

Roman Thilenius's icon

that is probably true for all or most methods. but those where the frequencies are continously variable are killing fft or resonatorbanks easily when the "number" is low. :)
kyma can use the spectral files from lemur for analysis, so guess what it (roughly) uses.

Baek Santarek's icon

"but those where the frequencies are continously variable are killing fft or resonatorbanks easily when the "number" is low. :)"

Interesting, but I am not sure I understand, Roman. What "number" do you speak of? Thanks!

Roman Thilenius's icon

sorry i mean like... fft bins... or sines. maybe "resolution" is the better word. :)

the best solution is probably one engineered specifically for your aim, not a certain technique.


e.g. in kyma you open the spectral analysis tool (fast fourier based), then you can choose between realtime and nonrealtime, or import files from other apps, then it has different modes how it works, and then you can use the result to play stuff from a sinebank. and between these steps you are prompted with a graphical editor where you can change stuff or cut out errors or silence in the analysis file.


works all every well for pitch and time manipulation and their decade-long informercial about morphing two samples. but maybe your focus is more envelope modulation or creating multilayers and there is something better for that?


i want a melodyne external for 20 years, but it won´t happen.

Baek Santarek's icon

"the best solution is probably one engineered specifically for your aim, not a certain technique."

This is a great point I have to really remember. I have a tendency to want "an ultimate solution" while a workaround might suffice for my goals.

Thanks, Roman!