Search code examples
mathpseudocode

Statistical best fit for gesture detection


I have a linear regression equation from school , which gives a value between 1 and -1 indicative of whether or not a set of data points are close enough to a linear function

enter image description here

and the equation given here

http://people.hofstra.edu/stefan_waner/realworld/calctopic1/regression.html

under best fit of a line. I would like to use these to do simple gesture detection based on a point in 3-space (x,y,z) - forward, back, left, right, up, down. First I would see if they fall on a line in 2 of the 3 dimensions, then I would see if that line's slope approached zero or infinity.

Is this fast enough for functional gesture recognition? If not, could someone propose an alternative algorithm?


Solution

  • If I've understood your question correctly then (1) the calculation you describe here would probably be plenty fast enough, (2) it may not actually do what you want, and (3) the stuff that'll be slow in an actual implementation would lie elsewhere.

    So, I think you're proposing to do this. (1) Identify the positions of ... something ... (the user's hand, perhaps) in three-dimensional space, at several successive times. (2) For (say) each of {x,y} and {x,z}, look at those two coordinates of each point, compute the correlation coefficient (which is what your formula describes) and see whether it's close to +-1. (3) If both correlation coefficients are close to +-1 then the points lie approximately on a straight line; calculate the gradient of that line (using a formula similar to that of the correlation coefficient). (4) If the gradients are both very close to 0 or +- infinity, then your line is approximately parallel to one axis, which is the case you're trying to recognize.

    1: Is it fast enough? You might perhaps be sampling at 50 frames per second or thereabouts, and your gestures might take a second to execute. So you'll have somewhere on the order of 50 positions. So, the total number of arithmetic operations you'll need is maybe a few hundred (including a modest number of square roots). In the worst case, you might be doing this in emulated floating-point on a slow ARM processor or something; in that case, each arithmetic operation might take a couple of hundred cycles, so the whole thing might be 100k cycles, which for a really slow processor running at 100MHz would be about a millisecond. You're not going to have any problem with the time taken to do this calculation.

    2: Is it the right thing? It's not clear that it's the right calculation. For instance, suppose your user's hand moves back and forth rapidly several times along the x-axis; that will give you a positive result; is that what you want? Suppose the user attempts the gesture you want but moves at slightly the wrong angle; you may get a negative result. Suppose they move exactly along the x-axis for a bit and then along the y-axis for a bit; then the projections onto the {x,y}, {x,z} and {y,z} planes will all pass your test. These all seem like results you might not want.

    3: Is it where the real cost will lie? This all assumes you've already got (x,y,z) coordinates. Getting those is probably going to be more expensive than processing them. For instance, if you have a camera-based system of some kind then there'll be some nontrivial image processing for every frame. Or perhaps you're integrating up data from accelerometers (which, by the way, is likely to give nasty inaccurate position results); the chances are that you're doing some filtering and other calculations to get position data. I bet that the cost of performing a calculation like this one will be substantially less than the cost of getting the coordinates in the first place.