Image to Additive Synth
Hello,
I'm really interested in the relationship between image and sound and more specifically interpreting images as sound. I've been trying to get together some patchs that will take a BMP image and interpret it as spectral data for a sound. I've had success with sculpting out the picture from noise using pfft~ but what I'm really interested in doing is using additive synthesis to build up the spectrum (ie. the kind of thing metasynth does) The advantage being that the waveform and other parameters of the root sound can be altered to make the final sound more musical. I've attached a patch that I've made that more or less does what I want it to (the spectrogram even occasionally resembles the starting image), the issue is that final product contains all sorts of spurious noise and has time resolution issues. For example if I import an image of a sonogram it sounds nothing like it should. My question is twofold. First is the method for re synthesis that I've implemented a dead end? I can't really think of any further ways to optimize it. If it is a dead end, it seems like I will want to turn to some sort of Image>SDIF>sound approach. The CNMAT tutorials are very helpful regarding the SDIF>sound portion but I'm afraid I have no clue how to convert a BMP file to and SDIF file in max. Are there other ways I'm missing to optimize the approach I'm trying now? If not how would I begin to turn a .BMP into an SDIF? Thank you all for your time and for your help.
Cheers,
David
PS I've attached two BMP images in case they prove useful in poking around the patch. One is of some flame fractal lines and the other is a sonogram of Marvin Gaye. Thanks again.
I did something similar with jit.peek and oscbank~, might be worth looking at.
Have you checked out this PAD:
Yes I have. I love Baz's PADs! Alot of the ideas for my patch came from that one. I ended up using something more similar to the patch in the examples folder "jit.peek~ additive synth" in the end though.
In other news I've found at least one of my errors. I forgot to set the jit.qtmovie object to "@adapt 1". So that addresses at least some of the time resolution issues I was having. Its not a surprise that 320 pixels wasn't sounding quite right when there were supposed to be 4000! This fix, however, doesn't really seem to help the overall sound (its still quite a buzzy mess) as there seem to be some more underlying problem with my design.
Thank you both for your reply. I hope this is at least somewhat interesting.
In case anyone is still interested in haveing a look, Here is the patch again. All I've changed is the "@adapt 1" bit" as well as added a button to make it play at the images natural rate (only helpful if your trying to import a sonogram.)
Thanks for posting your patch. I'll have a look soon when I've got time. I've been meaning to try something similar for a while too.
Thanks so much Mark, Luke. I'll keep looking it over myself and posting if I feel I've made a major fix.
So do you want it to work over the standard chromatic scale? Feels like there should be an mtof somewhere to constrain your pitches to this.
Check this patch, you can modify sound as spectrogram by clicking/draging on the spectrogram. I have about 40 similar patches, will post them in few days time, just need to finish the accompanying text.
PS: this patch does not allow you to record the sound as spectrogram, so I am sending you one pre-recorded matrix as well.
Mark, I think I like the idea of not necessarily constraining the pitches to semitones or to a particular tonality as many harmonics and interesting sonorities do not fit within standard temperament. My plan for restricting an image in such a way would be to edit it in Photoshop with a mask made of just semitone lines. That would leave the original sound more or less intact, certainly more so the changing the harmonic step parameter, while imposing a tonality.
t, absolutely wicked patch! I love concept and its making some really cool sounds. The multiple levels of blur are a nice feature. However, and correct me if I'm wrong, but the patch mostly functions via a reverse fft, where noise is filtered by the spectral image. The question I'm posing concerns doing this sort of thing, image to sound, via additive synthesis. This way the oscillator used to reconstitute the sound would be a flexible parameter. That being said, really wicked patch man, thanks for sharing it.
I've made some significant changes to my patch, the sonogram of marvin is now recognizably marvin gaye so that's a big step. Most of what I have changes is fixing the parameters so they adapt to the right settings on import. However it seems two big problems remain as the sound produced is messy and full of side bands. I think they are being generated as a result of me driving this thing with a signal? I'm not sure but there is one low frequency one and one high and if i just import an image of a horizontal line (aka something that should translate to just a sin tone) I still get these whacky sidebands. additionally this patch is hoggin around 50-60% CPU if not more. That may just be the way it is with the number of oscillators required but that still a bit of a drag. Anyway here is the patch. Thank you all again for your replies its been a big help to me.
Sure, I was distracted by you mentioning the word musical. I've not used Metasynth for a while now, but as I remember it never sounded very harmonic with photographic type images. There was a nice way of filtering to different scales though, with visual feedback (like you describe). Maybe for additive synthesis it would be interesting to map the pitch spacing to an adjustable curve or constrain it to specific ranges - thinking about the interesting effects of frequencies close together.
It certainly would be. The effect that dense tone clusters can have is so interesting. As to mapping the pitch spacing, I've certainly found logarithmic spacing sounds more natural. What other ways would you consider? It could be an interesting way to abuse sounds that are somewhat recognizable (ie sonograms and not abstract images).
As for the patch, I just can't get rid of the damn extra frequencies and phasing. Its something getting messed up with the phase but I have no phase information to work off of and most attempts I've tried to randomize the phase end up in popping sounds (albiet less random frequencies). I think I'm going to try and implement something closer to the Baz PAD that you posted initially, perhaps I don't need this to be signal driven or sample accurate for it to work. I may also try to implement some frame interpolation ala the patch that t posted. I'll post the result when I finish it.
Thanks again everyone who posted,
David
PS. Mark, really diggin your soundcloud man.
quote: "and correct me if I'm wrong, but the patch mostly functions via a reverse fft, where noise is filtered by the spectral image."
the patch is technically speaking a phase vocoder, where spectral data (result of FFT) is stored in the matrix (instead of in a buffer). the amplitude part of the (FFT) represent the spectrogram that is also a user interface to manipulate the FFT data. so yes, it is all about FFT and inverse FFT.
but what FFT does have nothing to do with filtered noise. if you do an FFT and IFFT on a signal, without any processing in frequency domain, you get exactly the same output signal as was the input (if we forget about calculation rounding/errors that are far from audible). and the IFFT in Phase Vocoder works very much like a bank of oscillators in additive synthesis. you can control the amplitude of each "oscilator" and also the frequency. With frequency you are restricted to a single FFT bin - in other words, the lowest and the highest frequency of two "neighbour oscillators" are the same, so with all "oscillators", you can reach every frequency in the spectrum.
But if you really want to do the task with real oscillators, you could leave the IFFT out and use the phase vocoder FFT data to drive your oscillators. you would have to calculate the frequencies from FFT data, that is basically a (bin freq.)+-(phase offset). The implementation is a bit tricky but if you are interested in this, just wait till the end of the week, i will post to the forum my dissertation on this subject with many patches.
Hi t,
I think I understand now. The issue still remains about actually arbitrarily altering the waveform of the oscillators that reconstitute the sound. I'm not sure how that would work with FFT, IFFT, Phase vocoding processes. Additionally, as I wish this to work with abstract images, not only sonograms, I won't always have real phase information to work with. I'm starting to suspect that this is the issue with all the weird sidebands I'm getting with my additive method. Regardless of the the method, how should I address phase problems when I have no real phase data to work with? I've tried randomizing the phase of the oscbank~ with noise input but it just gets clicky (although it does reduce the glissy sidebands). I have not tried randomizing it with a FFT method and will investigate that.
This combined with the extraordinary CPU strain this is causing is making me think that perhaps running this in Non-Real time would be better (at least if I decide to stick to an additive approach).
PS I look forward to reading your dissertation!
Well no combination of those things worked. From looking at the spectroscope~ read out it would appear that some sort of flanging is going on. Additionally, the oscbank~ is generating tones and noise that it should not be making (as the pixels in the original matrix are black.) I'm going to post a minorly updated version of the patch. If anyone has time to play with it and tell me if they think they know what could be going wrong with it it would be greatly appreciated. I just have no idea what could be going wrong. Thanks again for all your help.
Cheers,
David
i posted my dissertation in Jitter forum:
enjoy:)