Some are very interesting, like siger or gestopoz. Based on mouse movement, they should recognize 2D gestures easily. Ncore would allow for further experiments, because it features multipoint tracking i understood (FTIR surface ? mmmmm).
But a Gesture patch should allow custom gesture recording to be really efficient. How can this be done within QC ? Load a file provided by a separate app or record directly within the graph ?
Anyway, Gesture patch would be a MUST !
I've been idly trying to figure out how to accurately represent gestures in a way to allow for arbitrary numbers of inputs. For example, the wiimote would have 3 (x, y, z acceleration), maybe 2 (pitch/roll, which are actually derived from x, y, z anyway), and maybe many more (x, y, z, nunchuk x, y, z). Many, if not all, of the gesture libs I've looked at thus far are constrained to 2 dimensions (x, y), and some are actually glyph recognizers (written symbols), not actually gesture recognizers (glyphs have a definite start/stop point, like mouse click/unclick, while gestures have no definite begin/ends)
So, one idea I'm kicking around is the idea of directionality and Levenshtein Distances. Directionality can be used (possibly with length) to symbolically represent a dimension of input (increasing, decreasing, steady) — length would be for merging adjacent duplicates — and then a Levenshtein distance can be used to compare the symbols with a set of defined gestures. As the gesture is being constructed, it can compare against partial gestures to reduce the ones that obviously don't match, and if none match well the current stream can be cleared to make way for new potential input gestures. Maybe a "don't care" symbol too, for dimensions that aren't particular to a given gesture.
I'm not sure how easy it would be to 'record' such gestures, but they'd be simple to load/save, and could maybe even be specified by hand for simpler gestures.
Is this idea rational? is it practical? Or are there more accurate/widely accepted algorithms for interpreting gestures without special cases?
It took me some time to browse through the Levenshtein documentation... and it seems -apart from very complicated ;) - an ùber-smart hack to get it done, espacially the realtime part of the curve analysis: no start/stop point.
Right now i'm experimenting (as far as my timetable allows me to) gesture recognition under MaxMsp , using FTM externals and MnM.
It uses internally Hidden Markov Models to extract parameters from raw data
http://en.wikipedia.org/wiki/Hidden_Markov_model
The stuff about Mahalanobis distance looks interesting (perhaps cheaper and statistically more accurate).
HMM's, for my limited experience, seemed like a cross between FSM's and neural networks (weights like an NN to control transitions between states in the FSM). For complex states, Markov stuff can eat a ton of Ram (maybe it's not so expensive in modern times... back when 64MB of ram was a lot, markov hammered swap almost immediately for non-trivial data sets I was working with)
Thanks for the links, I'll have to think about this some more before grinding out some code (I was actually considering starting this this week, but now I think I'm going to spend some more time studying :)
[edit: reading more on MnM, it looks like they're attempting to solve exactly the same problem (identify the closeness of an N-dimensional curve to a set a pre-recorded N-dimensional curves), and with a somewhat different method. Definitely worth checking out. As always, thanks for the link :)]
Slight mistake there: I've worked with Markov Chains that would consume considerable amounts of memory, not so much HMMs. HMM's seem to almost always be single-state, so this isn't a problem. Markov Chains, on the other hand, can have a set of previous states, with probabilities for each destination on a per-state-set basis, which essentially powers memory usage (N^X, N=number of states total, X=number of previous states to keep track of).
Mahalanobis dist and HMM's appear to be the canonical way to do gesture recog, so I think I'm going to bug my stats friends to learn more about this approach instead of the levenschtein hack previously mentioned.
HMMs where initially used for understanding language patterns (1936 from what i remember). They seem to have been introduced in computer world for... speech recognition.
The funny thing is that i came to theses HMMs when working with piezoelectric sensors yesterday (that are just cheap microphones). I just needed a way to analyze a time-tagged list of floats (sensor data)....
Under QC, there's currently -to my knowledge- no way to analyze time-tagged data properly (means, excluding Javascript node). I did fight a lot with the GREP node, but it compares only on strings. And it is quite pointless to try a float-2-string conversion.
So i hacked a little javascript that means a running input and outputs a string according to its direction: L(eft) R(ight) U(p) D(own).
Queue this and you end up with a running sequence of strings: UUDLRLDLLULRUD ..... (who said Gattaca ?) which can then be "grep"ped to find patterns in it.
While this actually works (and avoids the "start/stop" problem as it keeps following the flow of data), it is not precise at all.
To Date, all the gesture recognition I've seen tracks point motion (mouse cursor or multi-spot like the iPhone). They use fairly simple primitives, like directionality, start/stop points, and some other heuristic stuff that points can do.
With the Wii Remote, we have some other input though: accelerometers. These are more like vectors than 2D points, which might make for some new gesture recognition. (The toolkits I've looked at above, which isn't all of them, only seem to do point/line-based gesturing)
i see...
but it is easy to interpret acceleration data as a 2d graph... so i assume it might work with point based gesturing algos.
And when coupling multiple acceleration data, you get multi-dimentional graphs.
Here's an interesting tech.paper about getting gestures from the wiimote (at least in terms of concept):
http://www.ailive.net/papers/LiveMoveWhitePaper_en.pdf
Just got a chance to read this thoroughly after skimming some other gesture recog stuff. While a bit light on the details, it does provide some simple overviews that might help us get started.
(secretly, after playing several games I've decided that gesture recog could make for some incredibly cool human/composition interfaces :)
Interpreting acceleration as 1D or 2D is definitely easy (though it's a derivative of a displacement, rather than a displacement, but that's not a big deal).
I was thinking more about what new things are possible that might not be built into current gesture software. Thanks for the link, I'll skim that over :) Games are obviously using these gestures; the question is whether or not we have an existing library to help us, or will we need to extend one or roll our own. I don't know enough to say one way or the other, just a question on the table :)
I've only skimmed the docs (which are pretty cool, BTW... why does apple do so many cool things?!), but it looks like it only supports handwriting for text input? I suppose this could be employed, but it'd only make use of 2 dimensions of input.
yes, it's 2D only.
But separating 3d as : 2D + 1D and using conditionning should provide results, tho' it's not a "out of the box" solution,
and i guess it could be difficult to interpret wiimote motion with this...
And yes again, guys @Apple DO have a bunch of interesting technologies, and i suspect them to have a bunch more under their sleeves (touch-patent rumors...).
BTW, i tryied to make one "simple" gesture recognitionner under QC without any satisfying results (using regular QC nodes, no scripting tho'). If anyone here happen to have a clue on how to achieve this....
There is software called xGestures by Brian Kendall that I have been using since early 10.4. You can get OS wide mouse gestures with visual feedback. I realize that doesn't give you the same level of control that something built into QC would give you, but you could do some duct tape and dental floss work to link most things together.
How to record gestures ?
Some are very interesting, like siger or gestopoz. Based on mouse movement, they should recognize 2D gestures easily. Ncore would allow for further experiments, because it features multipoint tracking i understood (FTIR surface ? mmmmm). But a Gesture patch should allow custom gesture recording to be really efficient. How can this be done within QC ? Load a file provided by a separate app or record directly within the graph ? Anyway, Gesture patch would be a MUST !
I've been idly trying to figure out how to accurately represent gestures in a way to allow for arbitrary numbers of inputs. For example, the wiimote would have 3 (x, y, z acceleration), maybe 2 (pitch/roll, which are actually derived from x, y, z anyway), and maybe many more (x, y, z, nunchuk x, y, z). Many, if not all, of the gesture libs I've looked at thus far are constrained to 2 dimensions (x, y), and some are actually glyph recognizers (written symbols), not actually gesture recognizers (glyphs have a definite start/stop point, like mouse click/unclick, while gestures have no definite begin/ends)
So, one idea I'm kicking around is the idea of directionality and Levenshtein Distances. Directionality can be used (possibly with length) to symbolically represent a dimension of input (increasing, decreasing, steady) — length would be for merging adjacent duplicates — and then a Levenshtein distance can be used to compare the symbols with a set of defined gestures. As the gesture is being constructed, it can compare against partial gestures to reduce the ones that obviously don't match, and if none match well the current stream can be cleared to make way for new potential input gestures. Maybe a "don't care" symbol too, for dimensions that aren't particular to a given gesture.
I'm not sure how easy it would be to 'record' such gestures, but they'd be simple to load/save, and could maybe even be specified by hand for simpler gestures.
Is this idea rational? is it practical? Or are there more accurate/widely accepted algorithms for interpreting gestures without special cases?
Mahalanobis vs Levenshtein
It took me some time to browse through the Levenshtein documentation... and it seems -apart from very complicated ;) - an ùber-smart hack to get it done, espacially the realtime part of the curve analysis: no start/stop point. Right now i'm experimenting (as far as my timetable allows me to) gesture recognition under MaxMsp , using FTM externals and MnM. It uses internally Hidden Markov Models to extract parameters from raw data http://en.wikipedia.org/wiki/Hidden_Markov_model
and also Mahalanobis distance, which seems even more promising, as not being scale dependent http://en.wikipedia.org/wiki/Mahalanobis_distance
see the MnM external doc here: http://ftm.ircam.fr/index.php/Gesture_Follower
The stuff about Mahalanobis distance looks interesting (perhaps cheaper and statistically more accurate).
HMM's, for my limited experience, seemed like a cross between FSM's and neural networks (weights like an NN to control transitions between states in the FSM). For complex states, Markov stuff can eat a ton of Ram (maybe it's not so expensive in modern times... back when 64MB of ram was a lot, markov hammered swap almost immediately for non-trivial data sets I was working with)
Thanks for the links, I'll have to think about this some more before grinding out some code (I was actually considering starting this this week, but now I think I'm going to spend some more time studying :)
[edit: reading more on MnM, it looks like they're attempting to solve exactly the same problem (identify the closeness of an N-dimensional curve to a set a pre-recorded N-dimensional curves), and with a somewhat different method. Definitely worth checking out. As always, thanks for the link :)]
Slight mistake there: I've worked with Markov Chains that would consume considerable amounts of memory, not so much HMMs. HMM's seem to almost always be single-state, so this isn't a problem. Markov Chains, on the other hand, can have a set of previous states, with probabilities for each destination on a per-state-set basis, which essentially powers memory usage (N^X, N=number of states total, X=number of previous states to keep track of).
Mahalanobis dist and HMM's appear to be the canonical way to do gesture recog, so I think I'm going to bug my stats friends to learn more about this approach instead of the levenschtein hack previously mentioned.
welcome to Gattaca
HMMs where initially used for understanding language patterns (1936 from what i remember). They seem to have been introduced in computer world for... speech recognition. The funny thing is that i came to theses HMMs when working with piezoelectric sensors yesterday (that are just cheap microphones). I just needed a way to analyze a time-tagged list of floats (sensor data)....
Under QC, there's currently -to my knowledge- no way to analyze time-tagged data properly (means, excluding Javascript node). I did fight a lot with the GREP node, but it compares only on strings. And it is quite pointless to try a float-2-string conversion.
So i hacked a little javascript that means a running input and outputs a string according to its direction: L(eft) R(ight) U(p) D(own). Queue this and you end up with a running sequence of strings: UUDLRLDLLULRUD ..... (who said Gattaca ?) which can then be "grep"ped to find patterns in it. While this actually works (and avoids the "start/stop" problem as it keeps following the flow of data), it is not precise at all.
Sidenote: i'll look into genome computing... ;)
To Date, all the gesture recognition I've seen tracks point motion (mouse cursor or multi-spot like the iPhone). They use fairly simple primitives, like directionality, start/stop points, and some other heuristic stuff that points can do.
With the Wii Remote, we have some other input though: accelerometers. These are more like vectors than 2D points, which might make for some new gesture recognition. (The toolkits I've looked at above, which isn't all of them, only seem to do point/line-based gesturing)
acceleration to 2d graph
i see... but it is easy to interpret acceleration data as a 2d graph... so i assume it might work with point based gesturing algos. And when coupling multiple acceleration data, you get multi-dimentional graphs. Here's an interesting tech.paper about getting gestures from the wiimote (at least in terms of concept): http://www.ailive.net/papers/LiveMoveWhitePaper_en.pdf
Just got a chance to read this thoroughly after skimming some other gesture recog stuff. While a bit light on the details, it does provide some simple overviews that might help us get started.
(secretly, after playing several games I've decided that gesture recog could make for some incredibly cool human/composition interfaces :)
Interpreting acceleration as 1D or 2D is definitely easy (though it's a derivative of a displacement, rather than a displacement, but that's not a big deal).
I was thinking more about what new things are possible that might not be built into current gesture software. Thanks for the link, I'll skim that over :) Games are obviously using these gestures; the question is whether or not we have an existing library to help us, or will we need to extend one or roll our own. I don't know enough to say one way or the other, just a question on the table :)
apple's built'in library
can't we use apple's built in library for handling gestures ? KInkGesture? see: http://developer.apple.com/documentation/Carbon/Conceptual/using_ink/ink... and: http://developer.apple.com/documentation/Carbon/Conceptual/using_ink/ink... i'm just curious on how this might be usefull.
I've only skimmed the docs (which are pretty cool, BTW... why does apple do so many cool things?!), but it looks like it only supports handwriting for text input? I suppose this could be employed, but it'd only make use of 2 dimensions of input.
2D only
yes, it's 2D only. But separating 3d as : 2D + 1D and using conditionning should provide results, tho' it's not a "out of the box" solution, and i guess it could be difficult to interpret wiimote motion with this... And yes again, guys @Apple DO have a bunch of interesting technologies, and i suspect them to have a bunch more under their sleeves (touch-patent rumors...).
BTW, i tryied to make one "simple" gesture recognitionner under QC without any satisfying results (using regular QC nodes, no scripting tho'). If anyone here happen to have a clue on how to achieve this....
EDITED: i just found this: http://faculty.washington.edu/wobbrock/proj/dollar/ and it seems promising. check this out: http://faculty.washington.edu/wobbrock/proj/dollar/dollar.js any javascript guru around ?
Have you seen the multitouch pad of the new air book in action ? Gesture is integrated in Apple motion and not so easy to work with...
But this one would be cool :
There is software called xGestures by Brian Kendall that I have been using since early 10.4. You can get OS wide mouse gestures with visual feedback. I realize that doesn't give you the same level of control that something built into QC would give you, but you could do some duct tape and dental floss work to link most things together.
thx for the link. however, i
thx for the link. however, i don't think it will ever be possible to pipe the wiimote in (since this thread was originally wiimote oriented)
DTW / Dynamic Time Warping
This one seems very promising, and looks easier than HMM or Levenstein.... http://en.wikipedia.org/wiki/Dynamic_time_warping