Turing video to sound
I’m looking for simple way to make sound out of moving images.
"Grayscale video -> data matrix (-> normalise) -> sound"
I looked up pfft~ tutorials but it’s confusing for me.
Could you guys can give me some advice? I’ll really appreciate it.
you really need to think more about *how* you want to map video to sound.
Do you want the brightness of the video to control a sound parameter?
Do you want to use slices of the video to control a vocoder-like filter?
Do you want the amount of motion in the video to control the volume of your sounds?
There are a billion ways to do this, so us telling you how wouldn’t make much sense. "Mapping is everything", and once you’ve come up with a way of processing the video that you’d like to generate sound, then you’re a lot closer to your goal.
Yeah I made a patch that did it. I scaled the RGB and gamma, contrast etc values out of jitter and then mapped each pixel to a panned area of a quadraphonic sound field (ie if the red pixel was in the top right corner of the picture then the sound that I had mapped to red would be at the top right corner of the 4 speaker soundfield). I mapped the human hearing range to the human visual range (more or less) darker was quieter lighter was louder etc. It makes some interesting results. I later used the patch as a subpatch within a much larger patch to affect granular aspects of sound and also spatialise sounds in 3d. You can see the subpatch here (the "eye " in the middle of the patch is a vid) it gives more of a sense of pattern , flow and coherance than merely random number generators and is used to affect biquads here. This is explained at 6 mins – though this differs significantly to the original patch:
I can send you the original patch if you want though it shows the vid larger and more clearly.
You should also check out programs like atmogen and metasynth – although not moving they may give you ideas for mapping and spatialisation. I think Eric Lyon has been working on this as well so check out his website – he might have something on there.
I looked up atmogen and metasynth which gave me better understanding of sound. Thank you AUGUSTINE BANNATYNE – LEUDAR. (by the way your project looks awesome!)
The things that WETTERBERG pointed out were the reason I felt confused. I wasn’t fully understand sound the sound itself. Since a lot of spectral sound processing uses still image reading from left to right. In moving image, the frame rate is the time axis so I wasn’t sure what could the x, y axis and brightest data be. Let me write how I imagine the process and please correct me to make it sense
the video frame size would be 1024x. So,
x-axis could be 1024 oscillator
y-axis is amplitude (normalised and mapped from -1. to 1.)
brightness data in each cell is volume (mapped from -48 to 48Hz) -> brightness 0 is mute. brightness 255 is 48Hz
Does it make sense?
Is granular synthesiser in AUGUSTINE’s patch could be better solution than placing 1024 oscillator?
Thank you WETTERBERG and AUGUSTINE
LEft right (stereo) is basically 1d. A picture is a 2d object (as opposed to a hologram which is 3d). That is why I decided that to use a quadrophonic setup. To in anyway represent a moving image even vaguely realistically/logically it would have to be 2d audio – in other words surround sound. The reason I use 4 speakers is that each speaker represents one of the four corners of a square (or television screen) because moving video images dont happen in straight lines – they happen in squares (note the four volume controls in the corners of the screen in the image below). Ideally every pixel would be represented by a speaker – but thats not realistic. So four will have to do for now – this is very easy to set up. Then you can map colour spectrum to sound (you can do what I did to start with – using zmap/scale map 256 colours to 10hz – 18khz) – this can be changed.
The sliders in the screen shot below allow you to adjust the mapping range (ie; how loud or quiet does dark/light go ? – what range of frequencies should 256 colours be mapped to ?) you can adjust these in realtime so you can tune it to something listenable.
Anyway the patch and subpatches (called flatlandquad etc) are included – please let me know if you use or modify the attached, I made it a few years ago so some of the sliders are a bit weird in the new versions of max.
You can download the patch here :
Theres a stereo version included but to be honest its better to use 4 speakers min. You will need a soundcard with four outputs and 4 speakers – most pc desktops have 4 analogue outs or you could buy any crappy old 4 channel card for 30 euros .
You should read this :