advice on shape recognition #2 - training cv.jit.learn and/or cv.jit.undergrad
me again ,
i have a trouble training cv.jit.learn, maybe somebody could give me some advice on it.
i have a still image that is compared to an incoming stream of video.
the still image is of an actor in a specific pose, the incoming stream is that same actor doing different poses, i need to find the matching pose with the one of my still.
i got pretty familiar with the cv.jit set. although it still is not entirely clear to me how one "trains" [cv.jit.learn] or [cv.jit.undergrad] ?
any help on this would b highly appreciatd.
best,
You train an object, either cv.jit.undergrad or cv.jit.learn by giving it data that correspond to positive matches in its training inlet. If there is too much variation in the data used to train, you get a model that's too vague which results in a lot of false positives. If there is little or no variation in the data, the model will be too strict, yielding false negatives.
The last part means that you shouldn't use still pictures. Have a dancer stand in front of the camera and hold the pose, moving slightly but not so much as to take a different pose. Pause. Start again with the dancer closer or further away from the camera...
Getting a good model isn't easy and takes a bit of practice to know just how much variation to give to the training data.
In one of my pieces I had 7 postures with a recognition rate well above 90% for each of them. However, I had to add manual checks to the data because sometimes I was getting positives for two shapes. You need to figure those out by yourself. For instance: it's shape #2 only if model #2 returns positive AND model #5 is negative. (Because shapes #2 and 5 look alike.) Also look at the data to see if you can spot patterns that the training algorithm misses. Pay attention especially to the sign of m11, m21, m12, m30 and m03 from cv.jit.moments.
Jean-Marc
Thanks a lot Jean Marc,
this is quite some help ;) i was in the midst of building a patch that would use stills as my models for shape recognition.
i guess i can stop that.
do you use one cv.jit.learn or undergrad per pose ?
i would guess so. having too many cv.jit.learn/undergrad objects certainly affects the performance of the machine. any advice on handling this ?
i dont have much matrix processing going on appart the analysis, when shapes are recognized videos are triggered and played from the GPU via jit.videoplanes.
last question: what kind of lighting did you use on your dancer ?
i am working with a well lit and neutral light, meaning i avoid strong contrasts creating too many segments due to shadows.
thanks again for your precious help .
Quote: Jean-Marc Pelletier wrote on Sun, 03 December 2006 20:51
----------------------------------------------------
> You train an object, either cv.jit.undergrad or cv.jit.learn by giving it data that correspond to positive matches in its training inlet. If there is too much variation in the data used to train, you get a model that's too vague which results in a lot of false positives. If there is little or no variation in the data, the model will be too strict, yielding false negatives.
>
> The last part means that you shouldn't use still pictures. Have a dancer stand in front of the camera and hold the pose, moving slightly but not so much as to take a different pose. Pause. Start again with the dancer closer or further away from the camera...
>
> Getting a good model isn't easy and takes a bit of practice to know just how much variation to give to the training data.
>
> In one of my pieces I had 7 postures with a recognition rate well above 90% for each of them. However, I had to add manual checks to the data because sometimes I was getting positives for two shapes. You need to figure those out by yourself. For instance: it's shape #2 only if model #2 returns positive AND model #5 is negative. (Because shapes #2 and 5 look alike.) Also look at the data to see if you can spot patterns that the training algorithm misses. Pay attention especially to the sign of m11, m21, m12, m30 and m03 from cv.jit.moments.
>
> Jean-Marc
----------------------------------------------------
> do you use one cv.jit.learn or undergrad per pose ?
>
> i would guess so. having too many cv.jit.learn/undergrad objects certainly affects the performance of the machine. any advice on handling this ?
Yes. One per pose. Realistically, in my experience, the type of algorithms implemented in these two objects aren't good for large number of patterns. It was quite hard pushing it to 7. 3 or 4 is relatively easy, after that you run into lots of problems with poses triggering more than one positive identification and you have to sort these out.
The cpu cost of these objects in comparison mode is quite low (trivial in fact, for cv.jit.undergrad). That should not be something you need to worry about.
> last question: what kind of lighting did you use on your dancer ?
It depends. The 7 posture piece was actually a "hand dance". Only my hands were moving. I had a spotlight on me and wore dark clothing which made foreground extraction very easy. Another time, working with a dancer, we had even and neutral lighting and a simple background. This was a few years ago, though. Now, I'd probably try something with infrared to allow more artistic freedom in the lighting design. Knowing the lighting situation before hand is _critical_. You might have models that work 100% of the time in practice, but get completely thrown off when the lighting is changed during performance.
>
> Yes. One per pose. Realistically, in my experience, the type of algorithms implemented in these two objects aren't good for large number of patterns. It was quite hard pushing it to 7. 3 or 4 is relatively easy, after that you run into lots of problems with poses triggering more than one positive identification and you have to sort these out.
i see, that makes sense. i was planning in working with more poses.
in my initial idea i was planning to compare number of blobs, aera of blobs and their orinetation from my still images with the incoming video.
the actor stands still a few seconds when making the gestures to be recognized. ( which means i analyze video only when there is no movement ). of course i have no garantee that values from my stills and the video of the filmed actor will match since orientation and mass of blobs vary slightly.
therefore i was planning in adding some noise in the results of the analyzed video. although i am not there yet and have no idea if it would/could work.
i have a feeling this could be a good situation to work with fuzzy logic. what do you think ? since you advised me not to use stills i am planning on abundonning this idea.
thanks for the advices on lighting .
cv.jit is a formidable toolbox of computer vision objects but maybe you will
have better luck with eyesweb (windows only) which has objects tailor made
for tracking human poses, based on a skeletal model. it has a (primitive)
max like interface and can send udp triggers etc.
both apps working on the same pc can give you the benefit of different core
processing for a leaner patch.
one of the examples shipped with this free framework does what you need.
version 4 is a bit buggy but 3.3.0 is still available to download and is
stable and robust.
if you decide to experiment with it please report.
http://www.eyesweb.org/
On 12/4/06, karl-otto von oertzen wrote:
>
>
>
>
> >
> > Yes. One per pose. Realistically, in my experience, the type of
> algorithms implemented in these two objects aren't good for large number of
> patterns. It was quite hard pushing it to 7. 3 or 4 is relatively easy,
> after that you run into lots of problems with poses triggering more than one
> positive identification and you have to sort these out.
>
> i see, that makes sense. i was planning in working with more poses.
> in my initial idea i was planning to compare number of blobs, aera of
> blobs and their orinetation from my still images with the incoming video.
> the actor stands still a few seconds when making the gestures to be
> recognized. ( which means i analyze video only when there is no movement ).
> of course i have no garantee that values from my stills and the video of the
> filmed actor will match since orientation and mass of blobs vary slightly.
>
> therefore i was planning in adding some noise in the results of the
> analyzed video. although i am not there yet and have no idea if it
> would/could work.
>
> i have a feeling this could be a good situation to work with fuzzy logic.
> what do you think ? since you advised me not to use stills i am planning on
> abundonning this idea.
>
>
> thanks for the advices on lighting .
>
>
>
>
> --
> karrrlo
> www.marswalkers.org
> www.fleeingbirds.org
>
Thanks Yair, i know Eyesweb and its potential, and we had it in mind at the first stages of the project.
i am more keen into finding a jitter solution .
i used to work with softvns a long time ago and had it also in mind.
i think cv.jit has the potential to do what i want, i just need to find the right solution + the more of us use cv.jit , the qicker we ll have our "own" Eyesweb ;-)
thanks for the advice though.
Quote: yair r. wrote on Tue, 05 December 2006 05:17
----------------------------------------------------
> cv.jit is a formidable toolbox of computer vision objects but maybe you will
> have better luck with eyesweb (windows only) which has objects tailor made
> for tracking human poses, based on a skeletal model. it has a (primitive)
> max like interface and can send udp triggers etc.
> both apps working on the same pc can give you the benefit of different core
> processing for a leaner patch.
> one of the examples shipped with this free framework does what you need.
> version 4 is a bit buggy but 3.3.0 is still available to download and is
> stable and robust.
> if you decide to experiment with it please report.
> http://www.eyesweb.org/
>
> On 12/4/06, karl-otto von oertzen wrote:
> >
> >
> >
> >
> > >
> > > Yes. One per pose. Realistically, in my experience, the type of
> > algorithms implemented in these two objects aren't good for large number of
> > patterns. It was quite hard pushing it to 7. 3 or 4 is relatively easy,
> > after that you run into lots of problems with poses triggering more than one
> > positive identification and you have to sort these out.
> >
> > i see, that makes sense. i was planning in working with more poses.
> > in my initial idea i was planning to compare number of blobs, aera of
> > blobs and their orinetation from my still images with the incoming video.
> > the actor stands still a few seconds when making the gestures to be
> > recognized. ( which means i analyze video only when there is no movement ).
> > of course i have no garantee that values from my stills and the video of the
> > filmed actor will match since orientation and mass of blobs vary slightly.
> >
> > therefore i was planning in adding some noise in the results of the
> > analyzed video. although i am not there yet and have no idea if it
> > would/could work.
> >
> > i have a feeling this could be a good situation to work with fuzzy logic.
> > what do you think ? since you advised me not to use stills i am planning on
> > abundonning this idea.
> >
> >
> > thanks for the advices on lighting .
> >
> >
> >
> >
> > --
> > karrrlo
> > www.marswalkers.org
> > www.fleeingbirds.org
> >
>
>
>
----------------------------------------------------
ye, i ported all my projects to cv.jit and never looked back, but i still
have a soft spot for eyw.
On 12/5/06, karl-otto von oertzen wrote:
>
>
> Thanks Yair, i know Eyesweb and its potential, and we had it in mind at
> the first stages of the project.
> i am more keen into finding a jitter solution .
> i used to work with softvns a long time ago and had it also in mind.
>
> i think cv.jit has the potential to do what i want, i just need to find
> the right solution + the more of us use cv.jit , the qicker we ll have our
> "own" Eyesweb ;-)
>
>
> thanks for the advice though.
>
>
> Quote: yair r. wrote on Tue, 05 December 2006 05:17
> ----------------------------------------------------
> > cv.jit is a formidable toolbox of computer vision objects but maybe you
> will
> > have better luck with eyesweb (windows only) which has objects tailor
> made
> > for tracking human poses, based on a skeletal model. it has a
> (primitive)
> > max like interface and can send udp triggers etc.
> > both apps working on the same pc can give you the benefit of different
> core
> > processing for a leaner patch.
> > one of the examples shipped with this free framework does what you need.
> > version 4 is a bit buggy but 3.3.0 is still available to download and is
> > stable and robust.
> > if you decide to experiment with it please report.
> > http://www.eyesweb.org/
> >
> > On 12/4/06, karl-otto von oertzen wrote:
> > >
> > >
> > >
> > >
> > > >
> > > > Yes. One per pose. Realistically, in my experience, the type of
> > > algorithms implemented in these two objects aren't good for large
> number of
> > > patterns. It was quite hard pushing it to 7. 3 or 4 is relatively
> easy,
> > > after that you run into lots of problems with poses triggering more
> than one
> > > positive identification and you have to sort these out.
> > >
> > > i see, that makes sense. i was planning in working with more poses.
> > > in my initial idea i was planning to compare number of blobs, aera of
> > > blobs and their orinetation from my still images with the incoming
> video.
> > > the actor stands still a few seconds when making the gestures to be
> > > recognized. ( which means i analyze video only when there is no
> movement ).
> > > of course i have no garantee that values from my stills and the video
> of the
> > > filmed actor will match since orientation and mass of blobs vary
> slightly.
> > >
> > > therefore i was planning in adding some noise in the results of the
> > > analyzed video. although i am not there yet and have no idea if it
> > > would/could work.
> > >
> > > i have a feeling this could be a good situation to work with fuzzy
> logic.
> > > what do you think ? since you advised me not to use stills i am
> planning on
> > > abundonning this idea.
> > >
> > >
> > > thanks for the advices on lighting .
> > >
> > >
> > >
> > >
> > > --
> > > karrrlo
> > > www.marswalkers.org
> > > www.fleeingbirds.org
> > >
> >
> >
> >
> ----------------------------------------------------
>
>
> --
> karrrlo
> www.marswalkers.org
> www.fleeingbirds.org
>
"The last part means that you shouldn't use still pictures. Have a dancer stand in front of the camera and hold the pose, moving slightly but not so much as to take a different pose. Pause."
sorry to dig this up, but how do I interrupt my training just for a while until I've got the next relevant pose? for I'm not using film stills - how do I "extract" the body movement between the poses desired for training? since there's no "pause button"..
(this question does intentionally sound somewhat dumb, for I really haven't found the answer in the help file - and feel the need of someone pointing out the obvious to me. grin)
cheers!
-jonas