Creating sound from video - video reactive

Terry Sanders

Hello everyone! This is my first post here. I am using Version 8.6.5 and trying to create sound from fractal videos. There are some aspects I would like to improve, but I am not knowledgeable enough about MAX/MSP to know how to do so or what direction to take.

I would like to add the ability to incorporate scales into the generated MIDI notes. Since I have four discrete paths, I would like to use the BEAP Quantizer module or something similar, but be able to change keys/scales on four separate occurrences of the Quantizer at the same time. Is there a way to do this?

I would also like to add a better rhythm component for the overall patch.

Any ideas for improvement or direction are welcome. The current state of the patch is attached.

I am also using data from the videos to create controls for FM synthesis. This and the MIDI are combined later in a separate video editor. You can see results by doing a YouTube search for "Frax Audio."

Video2Sound.maxpat

Max Patch

Thank you!

Terry

Chema Salinas

Hi there. I'm also more or less new to playing with Max, but I've had a very similar question. Please take this all as being from a dummy newb. It seems, to me, that there's a lot of stuff that goes the other way, where audio inputs control the parameters of either video or still images. But there doesn't seem to be much that goes the other direction, where images control audio one way or another. At least, not much that I've found very compelling. I'm happy to be wrong about that and pointed toward cool stuff.

Anyhow, I'm chiming in because I'm also interested what the good folk around here might have to say and to add a couple thoughts about how that kind of control might work. I'll stay at a more, I guess, theoretical level because I don't know enough about the practicalities not to.

It looks to me from the patch you shared, and the YouTube videos I checked out, that you're using the average RGB outputs of the four video quadrants to generate control for the audio (it doesn't seem to load the video, or I'm dumb, which is equally likely, I did drop in a couple of their stock example videos to see).

I believe there are some objects and techniques (gate maybe? or some other kind of rounding tactic or limiter?) that would let you define how the continuous number feed is translated to consistent MIDI notes. That could let you define seven notes, maybe repeating in octaves with a simple multiplier, that would redefine the randomness as being in a specific key.

Also, maybe it would be useful to send the control video through something that would limit its output could reduce the amount of note events. Maybe converting to a binary black and white or greyscale or something else could help with that. You could still display the regular video but the audio control would be reduced to things that might be more easily used.

For rhythm, maybe setting some type of a repeating counter, or more than one, would let you establish some control over the note events. You could use parameters from one or more of the video outs to define the length/s of the counter/s and if those are kept to multiples of each other, maybe divided into fours or something, depending on the time signature you wanted, then things could stay in time and synced up.

Also, I know there's a color tracking tutorial somewhere in this giant bank of information. It might not be helpful for a fractal, as the colors are scattered around the frame, but split into four like that and then with some manipulation of the control images (BRCOSA maybe?) it might help, or at least generate some better ideas as to how it could work. I'm sure I'm missing a lot, again, dumb newb, but I've had similar questions and those are some of the ideas I've had about solving them. Good luck!

Terry Sanders

Thank you for your input. I agree, there has been a lot of work towards audio-reactive but not so much toward video-reactive. The best I have seen is an IOS app called VOSIS. It is apparent that it was probably developed with MAX/MSP just in the way it interacts. I found it by doing a search for 'image sonification.' That seems to be the closest set of terms that describes what I am chasing.

I have worked on my patch enough that I am currently happy with it. I found some patch on the forum (I don't remember who or where, or I would give a shoutout . . .) that was simple enough for me to understand. I was able to modify it to add scales to my patch. The patch is quite bulky. I first approached converting the video to monochrome before I found the Vizzie Analyzr. That seemed simple enough to use for RGB individual averages. As you noticed, I am using the jit.scissors object to split the input video into four quadrants, mostly for panning effect and more regional pitch generation.

I have learned a lot in the exploration. I will look into the color tracking, as it may be helpful. I am not yet completely satisfied with the rhythm aspect. I am doing as you say, using alternate output from the Analyzr to influence rhythm/speed of incoming data.
Thanks for your ideas!

Video2SoundScaler.maxpat

Max Patch

Chema Salinas

Awesome! Thanks for the tip on search terms and that app, it looks really interesting. And thanks for sharing your patches. One thing I found that was really interesting and might be fun to play with is Zig Sim Pro (like 4 bucks, American), which sends out your iPhone camera data. I made a patch for all inputs but it's heavy and tends to crash with everything running. At least on my last computer, I just upgraded and haven't tried it out on this one yet.

It connects pretty easily to Max via your wireless router and gives a ton of data that I've been using to convert to whatever other sort of control I wanted to play with, like using the depth of a smile to control the frequency of an audio output. Federico Feraro has a tutorial about it (below). I really like his other tutorials, too - even if they can be a little challenging. I've found that the facial tracking works well from the AR Kit part, but the NDI part that looks more at motion in a wider space lasts a couple seconds before crashing.

The Cycling '74 color tracking tutorial is "Tutorial 25: Tracking the Position of a Color in a Movie." Anyhow, thanks for not yelling at me. I hope you figure out how you want things to work without too much trouble.

ZIGSIM Face Input UDP.maxpat

Max Patch