of speedlim, gestures, ui and performance (for gigging musicians)

jamesson's icon

I'm building a kinect input handler which features gesture recognition. The recognizer works by comparing an input to a number of prerecorded inputs. I use speedlim to smooth the input. The value of the speedlim is able to be adjusted by the user.

The trouble is that the gesture recognizer requires a minimum number of input values. Even if I could figure out how to link the speedlim value to the mimimum gesture input, changing the speedlim may render all previous recorded gestures useless.

My options are

A) Jettison the speedlim. Not pretty but 100% accurate.

B)Lock the speedlim. Undesirable, as a theremin performer will have different requirements to a trained conductor.

C) Leave the speedlim open to user manipulation, but warn the user on change that all previously recorded gestures will be scrubbed. Basically means that user must retrain all gestures every time speedlim is changed.

My question is twofold. For gigging performers; what is your preference? I am making this thing more for others than for myself. If anybody expresses a strong preference for A, B, or C, I will probably go with it.

For max geeks; am I missing any options? If so, which?

Thanks much in advance

Joe

Roth's icon
Max Patch
Copy patch and select New From Clipboard in Max.

By using speedlim you are not really smoothing your input, but are slowing the data rate of your input by passing only one value for the period of your speedlim and ignoring all others. Check out this patch to see what I mean:

In this example, every fourth input is 1 and all others are 0. By looking at what is printed by print out you will see that sometimes 0 is passed from speedlim and sometimes 1. Unless I'm misunderstanding this isn't really the smoothing behavior you want because the smoothed value would really be 0 or something close to 0.

I would recommend doing some sort of low-pass filtering on your input instead (or maybe the leaky-integrator being discussed in this thread: https://cycling74.com/forums/what-is-an-integrator). Even using some other type of filtering than speedlim I still would expect that your gesture recognizer would behave differently if a user had different filter settings when recording gestures than the settings used when playing gestures in performance (just like if I made some piece that used pitch~ to do some pitch-tracking based triggering, I wouldn't expect it to behave the same if I changed my settings for pitch~).

Whatever method you choose for pre-processing your input, I would say do something close to option C: document the issue so the user is aware, but don't wipe the gestures if the user changes parameters. I can't really think of a reason why your user would want to record gestures with one setting and use a different setting for playback, but if they want to, why not let them.

Another option you could think about is when you store a set of trained gestures, store your input processing parameters with the gestures so when the training set is loaded, so are the necessary paramters for input processing.

jamesson's icon

I'll consider your suggestions re filtering techniques. The trouble with your final suggestion is that minimum gesture number is global for the recognizer; there is no way to process gestures with different minimum number of points at the same time. It would be wonderful if there was, but no.

Thanks much for the suggestions

Joe

Roth's icon

The trouble with your final suggestion is that minimum gesture number is global for the recognizer; there is no way to process gestures with different minimum number of points at the same time.

What I am suggesting is when you record a set of gestures you save the global settings with the gesture set. When you load the gesture set the same global settings are recalled as were used to record the gestures. Maybe that is not how you are planning on using this though.

So is the way your recognizer works that it waits until it receives some number of data points and then decides what gesture was performed? I imagine you have a constant data stream coming from the Kinect, so how do you know when a gesture begins?

For example (using integeers for my example input values to make things simplier), if a guesture consists of 10 points and you train the following two gestures:

27 30 49 39 21 17 14 7 2 5

39 21 17 14 7 2 5 8 15 23

and you have the following input:

50 30 29 27 30 49 39 21 17 14 7 2 5 39 21 17 14 7 2 5 8 15 23

The first recorded gesture starts at 4th input value and the second recorded gesture starts at the 14th input value. What happens in this case? From what you describe, I'm assuming either gesture 1 is chosen because you somehow trigger when a performed gesture begins or neither gesture is chosen since the first 10 inputs don't match either gesture.

Did you build your recognizer, or did you use some code/object you found?

jamesson's icon

Roth: I understood what you said the first time, however the issue is that there is no reason to store the gestures for a given limb in separate files _other_ than changing speedlim. Potentially this results in an individual gesture file for every value of speedlim - a huge mess. Something I thought about today was have a list of presets for various values of speedlim, but if I go that route it will be in a later version.

To answer your question, I initiate both recording and recognition by having the input remain still for a moment - in math terms, change in input is lower than some userdefined value for some userdefined time.

This is the code I'm using;

Roth's icon

Ah, I think maybe I'm the one that didn't understand then—but I think it is getting more clear.

I'll check out that paper in the link you posted later this week when I have a little more time. It looks like an interesting read and maybe I might have some better suggestions after that.

Two questions to help me undertsand what you are doing better:

So are you saying that every gesture potentially needs a different speedlim time? I thought you needed global settings for a given performance—now I'm thinking I was mistaken.

For your purposes, if a user makes the same shape with their arm twice, once very fast and once very slow, do you want them to be the same gesture? or would they be two different gestures?

jamesson's icon

No, you understood correctly - I do need one speedlim per performance. However, say the user trains some gestures and then decides that the speedlim is too fast/too slow. In the current situation that means all the gestures must be retrained from scratch - and since you need a minimum of 10 examples per gesture for over 80% accuracy, that is quite sad. I realize that the limitation is inherent in the gesture handler and not in max, but I'm still hoping for some solution that will be size-invariant. I have some thoughts on the subject already, but regardless they will be going into the next version.

Chris Muir's icon

I'm not sure how possible this is, but I think I might try to gather time-stamped minima and maxima of the gestures to try and come up with a vector approximation of the gesture.

jamesson's icon

not a bad idea, at that.

jamesson's icon

Again, let's keep talking and coming up with ideas - I'm not going to get to implementations til the next version, so it's brainstorm time.