basic motion tracking questions

jirko's icon

Can someone explain me what is the difference between using a depth map made with luma-displacement or some IR light, where is the advantage?

Is there any difference between kinect or sony eye?
For using the eye camera with jitter.. do I have to remove the IR Blocking Filter first?

Any help will be much appreciated!!

edit: sorry, this should be in jitter area..

yaniki's icon

It's a large topic...

Thanks to IR light you can work in amorphous light condition (eg. if you working on performance or installation and somebody want to make you a photo and using flash thanks IR you are sure it will be not interfering with your work). Depth map (in kinect) is based on IR image too and is usful when you want to measure distance between objects and camera, or if you are using some more advanced mocap techniques (eg. skeleton).

Kinect is a "ready made" IR camera (with IR light build-in). PS3 Eye you have to "hack" a bit for IR (to remove IR blocker, add visible light blocker).

jirko's icon

Ok, I get it. Thanks a lot.

One more thing.. using IR light will give better depth/distance results of the objects then using a normal luma-displace technique, is that right?

yaniki's icon

Main difference is: the depth map allows you to work with 3-dimensional data (depth map is 3-dimensional "ex definitione"), instead of typical luma-differencing techniques working with "classic" 2d images taken from cameras.

jirko's icon

I cant resist to make another question.. :)

so a basic technique to use IR light would be to unpack the incoming matrix and use the z-plane to extrude on a mesh, without converting to greyscale..

yaniki's icon

Image from infrared camera is "flat" (2-dimentional), like from any other kind of camera, except the stereoscopic (eg. kinect) - so there is no x, y, z planes here. Depth map from kinect is just an 2-dimentional, single plane matrix - every cell of that matrix is filled by the value depends on distance to camera (so, it's in fact 3-dimentional data, 'cause you can calculate x y coords from cells positions).

For typical motion capture in max you can use cv.jit library (jmpelletier.com/cvjit/) by Jean-Marc Pelletier - it's great tool for 2d motion tracking. Example patches from cv.jit should be very instructive for you. You can use those objects for IR based tracking or for work with image taken from any other kind of a camera. You can even use those object to process depth map from kinect, but it's not best way to work with it.

I can also find some my own patches for motion tracking, but not today... I think, I can do it tomorrow. Maybe my patches will be a good "kickstart" ;-). But, anyway, you have just to start with your own experiments. It's not really complicated.

jirko's icon

Ok, that makes sense. Thanks a lot for your time Yaniki. I would love to see one of your examples, specially an example of how to set up the depth map with a kinect or other device like the Sony eye.

Have a nice weekend!

yaniki's icon

nik

This is an entry-level motion tracking patch using only MaxMSP/Jitter build-in objects (no additional libraries, externals, etc.) - should work with any type of a camera, even with kinect (but in this case you have to replace "jit.grab"to eg. "jit.freenect" or other external receiving data from this device [you may to check other threads on the forum for more info about this topic]).

My patch demonstrates typical structure of motion tracking processing: from image filtering and background subtraction to converting the image into "numerical data". Actually every motion tracking system based on a camera is a variation of this model. The "jit.bounds" object (which is a most important object in the patch) is working very stable, but for more features (especially detection multiple objects in the same time) you have to use (as I mentioned in previous post) some additional stuff, and I'm strongly recomending you "cv.jit library" (jmpelletier.com/cvjit/) by Jean-Marc Pelletier.

4550.moCap1.maxpat
Max Patch
yaniki's icon

> Have a nice weekend!

Ach... not this time. Work, work, work... But thanks, anyway ;-)

yaniki's icon

Another variant of the basic motion tracking structure in attached patch - I think, this one will be better. Have fun ;-)

4553.moCap2.maxpat
Max Patch
jirko's icon

Hey Yaniki, thanks so much for that, very cool stuff.. but, I'm a bit confused. My goal is to use the eye camera and extrude with the depth map on a jit.gl.mesh, in the hope to get better depth results as with the luma-displacement technique. In your examples you are using ayuv2luma too.. for what? And what would be finally the depth map and how can I connect this map to gl.mesh to extrude on the z plane?

Thanks for your time and patience!

yaniki's icon

For depth map you need an OpenNI device (eg. kinect) - and PS3 Eye is not an OpenNI device. It's just a nice USB camera ;-).

If you need to convert depth map from kinect into a mesh, you need just a kinect and a few objects in max. It's simple. I made some installations using this feature (eg. https://vimeo.com/51404205), and I can post you a max patch next week, but's not a big deal: just a matrix storing depth values (received from kinect via jit.freenect) and some simple calculations (scaling, etc.) before sending to jit.gl.mesh.

yaniki's icon

hmmm... something is wrong with the link to the video, so again:

jirko's icon

Nice video! I thought that the sony eye will make a depth map too. So is it worth in terms of depth quality to get a Kinect?

yaniki's icon

For "real" depth map you need a stereoscopic device (eg. kinect), not a typical camera. But if you just want to create a 3d mesh from 2d image you can use pixel's luminance for "z" coords (https://vimeo.com/49839095 - another my video ;-) ). It's simple, but I can attach the patch if needed.

stereo's icon

Hi All,

Premise:
I am a newbie on CV and I am making my first steps in this field:)

Goal:
Tracking multiple people passing in front of a window.
For each person assign a blob and get X and Y movements.

Tech+Library used:
Kinect camera and "jit.freenect" for receiving data depth, Mac with OSx Sierra, Max 7, "cv.jit library" (jmpelletier.com/cvjit/) by Jean-Marc Pelletier for Computer Vision;

Done so far:
Depth data from Kinect > Mirror video > convert to grayscale > threshold image > define size of blob > label Blob (largest blob label 1, second label2 and so on) > centre mass for blobs (centroids) > coordinates for each blob.

Problems:
this works Okish, when people sovrappose then I have some overlays of blobs. (Especially when people come from opposite directions and meet in the center)
Setting up the right thresholds is a bit tricky
I am sure the patch is not optimized :)

Question
Should I implement background subtraction method? Any idea on how to improve the patch?
How would you tackle tracking of multiple objects?

Thanks in advance !
/stereo

kinect_Tracking.maxpat
Max Patch