Making Connections: Camera Data

Coming up with ways to get information about the physical world into Max is one of the most fun aspects of working with the software. Whether it is for video processing, sound creation, or any other type of output, physical interactions provide a space for much more interesting relationships to develop.

Unfortunately, many ways to get this information into Max require the user to get comfortable with connecting wires to circuit boards and understanding basic (and sometimes not-so-basic) electronics. For this reason, camera-based interactivity can be pretty enticing. There is also a reasonably low startup cost and plugging a camera in is usually a pretty user-friendly process. In this article, I will share a couple of basic techniques for using affordable webcams to gather data in MaxMSP/Jitter.

Download the patches used in this tutorial.

First, You Need a Camera

To get started, you will need a camera that you can access from Jitter using jit.qt.grab (or jit.dx.grab on Windows). A lot of laptops have built-in cameras now, which should work fine to get started. Eventually though, you will want to move on to a higher quality input at some point. Now, I’m often asked which camera I would recommend for use with Jitter, and this is always a really difficult question to answer. You can pretty much figure I haven’t tested anything that costs more than $200, but here is a list of features I look for when purchasing a camera for live video and computer vision projects:

1. “Tossability” Factor – If I don’t feel comfortable tossing something into my messenger bag and jumping on a bus with it, it will probably never get used. For this reason, an ideal camera should be low cost, easy to protect from damage, and reasonably small.

2. Compatibility with Jitter – On Mac, this means having Quicktime compatible drivers. Firewire cameras are usually supported by the generic DV or IIDC-1394 Quicktime drivers. There is also the open-source Macam driver that supports a wide range of USB webcams on Mac OS X. If you use Windows, you will want to make sure the camera has Direct X compatibility.

3. Control – Don’t let the camera make any decisions for you. When shopping for a camera, you want something that allows you to override the automatic image adjustments and gives you the ability to manually focus the lens.

I currently own two cameras that satisfy these needs fairly well. The first is the Unibrain Fire-i Board Camera. This little workhorse connects via FireWire 400, is supported by the IIDC Quicktime driver, and features a standard m12x0.5 screw mount lens holder. The 1394store website has a good variety of inexpensive lenses you can purchase with the camera, and also sells a C-mount lens adapter for use with higher quality lenses. Since focussing is done by manually screwing the lens in its mount, there is a great deal of control over focus. As a beginner’s camera for Jitter, it does pretty well.

The other camera is a PlayStation 3 Eye camera. After reading the rave reviews of this camera on Create Digital Motion, I purchased this camera for a recent gallery installation. So far, I admit I have been really impressed by the quality and reliability for such an inexpensive camera. The PS3 Eye is a USB camera that requires downloading the most recent Macam component to get it working on Mac OS X, and there are supposedly some third-party drivers available for Windows as well. There is a lot of discussion about this camera on the NUI Group forums, and there are even detailed instructions for opening it up and modifying it. For the tweakers out there willing to pry open a plastic bubble and install third party drivers, this is a pretty excellent solution.

How Much Action?

The most simple way to detect motion in video is to do a simple frame-differencing operation. This finds the difference, for each pixel, between successive frames. Combined with a threshold, it is easy to find which pixels have changed between frames and use that as an indicator of how much movement is happening in the scene. The middle (mean) outlet of jit.3m can then be used to calculate the number of pixels that are above the difference threshold. An example of this can be found in the “frame-differences” patch. Since the middle outlet of jit.3m gives the mean of all the pixel values, you can simply multiply that value by the total number of pixels in a frame (width x height) to get an absolute pixel count. For most purposes though, the average value is perfectly useable.

From here, you might want to know if this motion is happening in a particular region of the scene. To do this we can do a very simple masking operation. We simply supply a matrix with white pixels to designate a region of interest and multiply the output of our frame-differencing patch by this mask. This turns the pixels outside of our mask black, so we only see white pixels when there is motion inside of our intended region. Once again, jit.3m can be used to calculate the number of white pixels to give us a useful value.

Yet another approach is to track the location of motion using the jit.findbounds object, which will give you the top-left and bottom-right corners of a rectangle that contains all the pixels in a specified range. Taking the average of these two locations will give you the center of the rectangle (subpatch “motion-location).


Another classic technique for video tracking is to backlight the foreground subject or position in front of a bright white wall so that there is a distinct contrast between forground and background. By running the video image through a luminance threshold, we get a silhouette image-mask. This can be used for compositing or detecting whether a virtual object is inside or outside of the silhouette. Note that in order for this to work properly, you will need to have a certain amount of control over the environment. The benefit of this, of course, is that the detection algorithm doesn’t have to be too smart in order to get useable results. The “silhouettes” patch shows a basic version of this idea, with the ability to accumulate an average background image for to account for slight variations in the backdrop.

If you don’t have access to a situation that provides the necessary luminance contrast for the above method, a technique called “background subtraction” can sometimes be used to detect a moving shape. In essence, frame-differencing is a form of background subtraction, but in a very simple form. A more advanced technique is to take a median of several sampled frames to act as your background image, and compare the current pixels to this image. Rather than collecting bunch of frames to calculate a median, we can create an “approximate median” using the “median-image” subpatch. We also take the median difference to correct for things like noise and background motion.

Luminance keying can also be a very useful for generating silhouette masks, although it often requires some very specific scene preparations as well.

Other Approaches

Once you’ve exhausted the possibilities of background subtraction, you might find yourself desiring more advanced algorithms for computer vision in Max. For this, I would highly recommend the free cv.jit externals written by Jean-Marc Pelletier. This highly useful set of objects provides a variety of tracking algorithms (blobs, features, optical flow, etc.) in the form of compiled objects and simple abstractions. In addition to that, there is Cyclops, originally written by Eric Singer and sold by Cycling ’74, as well as SoftVNS by David Rokeby. There are also many resources available online for further study.

November 19, 2009 | 6:04 pm


November 28, 2009 | 12:29 am

Very :)

November 30, 2009 | 5:13 am

Holy silhouettes awesome. I would post my patcher based of this but it is HUGE. It’s basically a CV midi controller…

December 6, 2009 | 3:45 pm

Very nice patches. Have been using "frame-differences" to generate sound, but as soon as I get a bit higher prices from the float connected to jit.3m (the one that triggers sound)the sound gets completely distorted. Being a novice, could I get some feedback?

December 7, 2009 | 11:22 pm

@Alex, You might want to check that you aren’t multiplying a signal to values greater than 1. Sounds like you are getting clipping. Check out the MSP docs for more detail.

December 7, 2009 | 11:35 pm

Thanks for the tip Andrew.
By the way, great patch, really inspiring.

December 9, 2009 | 3:26 pm

@Andrew, In the end I realized that the prices given from the jit.3m object influence the sound, whether this number box is connected to the sound source or not. As soon as they both open, this distorting influence starts once the prices get higher than 0.004 or something. Do you know how I can avoid this?

December 10, 2009 | 9:04 pm

@Andrew, Finally I fixed it. Thanks anyway.

December 15, 2009 | 2:24 pm

Magic Andrew! Once again, Thank you! I keep following your teachings! Also, my students here in Texas also follow! Thank you! Glad to have someone at c74 deeply oriented towards video and visuals!

December 22, 2009 | 6:11 pm

thank you!!!

February 11, 2010 | 4:25 pm

Thanks for sharing the interesting patches. But I have no idea what "jit.op@ >", "jit.op@<", "jit.op@op* *val 0.001" are for. It will help a lot if there are some comments added to the patches to explain why those objects are there.

February 11, 2010 | 8:36 pm

I am using the PS3 camera, which is a great tip, thanks Andrew! Actually I bought a second one to get these two cameras together in to the system for a performance project. I basically copied the input and tried to choose the two cameras from the umenus, but when the two cameras are connected, max crashes. Is there a reason for that, is it possible that one cannot link two cameras of the same kind? And help is appreciated!



February 11, 2010 | 9:51 pm

Hi Chienwen,
This is used to create an "approximate median" by incrementing or decrementing the accumulated value depending on whether new values are greater than or less than the previously calculated median value. The >, < tests to see if the new values are greater than or less than. The binary (0. or 1.) values are then scaled using jit.op @op * so that it only inc/decrements the value by a smaller value each frame.

February 11, 2010 | 9:56 pm

Hi Falk,
It might be a limitation of the camera driver, to have more than one of the same camera, however Max probably shouldn’t be crashing regardless. If this persists, please contact support with a bug report. Another thing to try would be to use a powered USB hub for the cameras, since the USB bus on your computer might not be able to drive both cameras.

February 12, 2010 | 1:06 am

Hi, Andrew, Thank you so much. I have never learned this algorithm in my high school math class though. By the way, the median-motion patch can create some trails. I would like to know if there is a way to make those trails stay longer ?

February 17, 2010 | 12:50 am

Total noob to the whole video tracking thing. Info here is gold! Thanks

February 3, 2011 | 1:16 am

Gosh that was awesome. Thanks so much for sharing!

April 11, 2011 | 4:17 am

hi bassfalk (and all),

Have you had any joy in trying to cure the issue with using multiple Playstation 3 cams, I am creating a digital video installation and require 4 of the cams to run simultaneously, however every time I attempt to select another source camera Max constantly crashes. I can have the built in isight and one of the cameras running fine yet try and introduce a second playstation cam and max always crashes! :(
I am using a powered USB hub with the cams connected running though Macam (0.9.2) driver software then into Max, on an iMac 2.4 Ghz Core 2 Duo with 3GB ram, I have also tried running the cameras on a variety of frame rate and compression settings but still no joy!!
Any ideas from anyone would be very much appreciated!


April 14, 2011 | 9:53 am

hey andrew,
i’m using the silhouettes as a base set of components but want to have 3 sets of triggers with 3 different sizes. my issue is that when i set the first matrix size to say, 30/30 and the second to 1000/1000, the trigger only shows up in a 29/29 range. how do i set the size of the trigger while maintaining a 100/100 locator grid?
heres a section:

– Pasted Max Patch, click to expand. –

April 14, 2011 | 10:48 am

nevermind. i figured it out. scaling and then pack/unpacking.

September 29, 2011 | 8:02 am

Dude thanks! This was an awesome set of examples!

October 7, 2011 | 6:16 pm

I really like the [t l l] with the split outputs to ensure that the frame comparisons occur in the correct order, really slick; i’ll have to remember that.

January 7, 2012 | 1:29 pm

Hi everyone,
this is great post!

For Cookster you could try camTwist to connect more than one Cam to Max, it is recognised under Max6

November 9, 2012 | 4:10 am

Hi everyone,

I am planning an installation where I will need 6 cameras running simultaneously triggering different aspects of audio. I have been looking into buying a powerful Firewire or USB Hub then routing the cameras through these into Max. I intend to run this on my white macbook, will it be possible?

Amy McInosh
February 6, 2013 | 11:28 pm

Can a DSLR be used as a live feed to Jitter?

August 18, 2013 | 1:59 pm

Thank you for tutorial. Very helpful.

Viewing 26 posts - 1 through 26 (of 26 total)