Gesture Recognition

I searched through sourceforge and freshmeat.. What do you all think of the following:

Should we investigate porting any of these toolkits?

By smokris at 2007-07-17 10:44
R: Rejected

How to record gestures ?

Some are very interesting, like siger or gestopoz. Based on mouse movement, they should recognize 2D gestures easily. Ncore would allow for further experiments, because it features multipoint tracking i understood (FTIR surface ? mmmmm). But a Gesture patch should allow custom gesture recording to be really efficient. How can this be done within QC ? Load a file provided by a separate app or record directly within the graph ? Anyway, Gesture patch would be a MUST !

By franz at 2007-07-17 12:22

directionality and levenshtein distance

I've been idly trying to figure out how to accurately represent gestures in a way to allow for arbitrary numbers of inputs. For example, the wiimote would have 3 (x, y, z acceleration), maybe 2 (pitch/roll, which are actually derived from x, y, z anyway), and maybe many more (x, y, z, nunchuk x, y, z). Many, if not all, of the gesture libs I've looked at thus far are constrained to 2 dimensions (x, y), and some are actually glyph recognizers (written symbols), not actually gesture recognizers (glyphs have a definite start/stop point, like mouse click/unclick, while gestures have no definite begin/ends)

So, one idea I'm kicking around is the idea of directionality and Levenshtein Distances. Directionality can be used (possibly with length) to symbolically represent a dimension of input (increasing, decreasing, steady) — length would be for merging adjacent duplicates — and then a Levenshtein distance can be used to compare the symbols with a set of defined gestures. As the gesture is being constructed, it can compare against partial gestures to reduce the ones that obviously don't match, and if none match well the current stream can be cleared to make way for new potential input gestures. Maybe a "don't care" symbol too, for dimensions that aren't particular to a given gesture.

I'm not sure how easy it would be to 'record' such gestures, but they'd be simple to load/save, and could maybe even be specified by hand for simpler gestures.

Is this idea rational? is it practical? Or are there more accurate/widely accepted algorithms for interpreting gestures without special cases?

By cwright at 2008-01-24 20:21

Mahalanobis vs Levenshtein

It took me some time to browse through the Levenshtein documentation... and it seems -apart from very complicated ;) - an ùber-smart hack to get it done, espacially the realtime part of the curve analysis: no start/stop point. Right now i'm experimenting (as far as my timetable allows me to) gesture recognition under MaxMsp , using FTM externals and MnM. It uses internally Hidden Markov Models to extract parameters from raw data http://en.wikipedia.org/wiki/Hidden_Markov_model

and also Mahalanobis distance, which seems even more promising, as not being scale dependent http://en.wikipedia.org/wiki/Mahalanobis_distance

see the MnM external doc here: http://ftm.ircam.fr/index.php/Gesture_Follower

By franz at 2008-02-05 16:37

oo, interesting

The stuff about Mahalanobis distance looks interesting (perhaps cheaper and statistically more accurate).

HMM's, for my limited experience, seemed like a cross between FSM's and neural networks (weights like an NN to control transitions between states in the FSM). For complex states, Markov stuff can eat a ton of Ram (maybe it's not so expensive in modern times... back when 64MB of ram was a lot, markov hammered swap almost immediately for non-trivial data sets I was working with)

Thanks for the links, I'll have to think about this some more before grinding out some code (I was actually considering starting this this week, but now I think I'm going to spend some more time studying :)

[edit: reading more on MnM, it looks like they're attempting to solve exactly the same problem (identify the closeness of an N-dimensional curve to a set a pre-recorded N-dimensional curves), and with a somewhat different method. Definitely worth checking out. As always, thanks for the link :)]

By cwright at 2008-02-05 16:43

chain vs. model

Slight mistake there: I've worked with Markov Chains that would consume considerable amounts of memory, not so much HMMs. HMM's seem to almost always be single-state, so this isn't a problem. Markov Chains, on the other hand, can have a set of previous states, with probabilities for each destination on a per-state-set basis, which essentially powers memory usage (N^X, N=number of states total, X=number of previous states to keep track of).

Mahalanobis dist and HMM's appear to be the canonical way to do gesture recog, so I think I'm going to bug my stats friends to learn more about this approach instead of the levenschtein hack previously mentioned.

By cwright at 2008-02-06 02:24

welcome to Gattaca

HMMs where initially used for understanding language patterns (1936 from what i remember). They seem to have been introduced in computer world for... speech recognition. The funny thing is that i came to theses HMMs when working with piezoelectric sensors yesterday (that are just cheap microphones). I just needed a way to analyze a time-tagged list of floats (sensor data)....

Under QC, there's currently -to my knowledge- no way to analyze time-tagged data properly (means, excluding Javascript node). I did fight a lot with the GREP node, but it compares only on strings. And it is quite pointless to try a float-2-string conversion.
So i hacked a little javascript that means a running input and outputs a string according to its direction: L(eft) R(ight) U(p) D(own). Queue this and you end up with a running sequence of strings: UUDLRLDLLULRUD ..... (who said Gattaca ?) which can then be "grep"ped to find patterns in it. While this actually works (and avoids the "start/stop" problem as it keeps following the flow of data), it is not precise at all.

Sidenote: i'll look into genome computing... ;)

By franz at 2008-02-06 04:29

gesture specific with out coding

an easy way to get a gesture recognition without any fancy coding algo by old math people that might work with the bounds of the screen being -1 to 1 is to sum the total of a gesture with the que or sample and hold fed into a math adding the numbers up then use a conditional set with a tolerance to give you a hit. like if you sum a big square and draw a little square its not going to say square. but if you do your gestures in different places it should work theoretically.

or use a simple java script. fill a x, y, and z array. sum the array and do a if(x<tollerance) result.hitx = true. by some fluke you could get the same sum. to record z depth on a MT you would have to make a gesture for it in the first place. so x,y are more practical.

By dust at 2009-02-01 20:54

junction tree, random HMM, cost, graphic modeling.

i have been researching pattern recognition for awhile. there are lots of ways to do this. one way you have got almost built is the open CV haar training. Thats basically the supervised positive/negative machine learning model algo. the random markov field is used in lots of machine learning. seeing that you want to do a more 1-1 mapping with a threshold machine learning is not the fastest way to go but, if quartz finds the tolerance of multiple gestures that are then mapped to the 1-1 model it would be much better for various users.

the value historian will prove useful for this. i haven't really looked or played with it yet but recording and saving learning data sets or training data sets is important. automating the process would be nice. if you are recording optical flow or open CV a remote is essential you cant have the movement of stopping the value recording in your training set. you also need a noise state because once you get your means with what ever method you choose the simple act of not touching or moving will be included in your training sets so when you are done your gesture you will get a positive hit as well because all sets have non moving or touching state in them etc... i have been successful translating sign language to speech with open CV. or at least with the few gesture videos i could download from asl. that was a few years ago, but now MT is popular so im looking more into it.

so i stumbled across this guys method the other day. actually most methods use some sort of markov model. but i have been studying graphic modeling so i watched a bit of this lecture. this might help solve for z depth. basically he makes a class model of both images and both images have sub class tages on them. then there is a 1-1 mapping of features with obvious difference even if they are the same image taken from different z depths like a satellite. he then does a cost function averaging the means of the 1-1 map, then uses random markov fields with a graphic model, it looks like a pentagram, then he does some sort of junction tree triangulation with the nodes to get more precision, and at that point in the video i fell asleep, but you can view the slides if you don't want to watch the lecture. he is doing this so satellites can take a picture of stars to get there position so they can reposition themselves. which seems complicated to me because stars are moving as well.

http://videolectures.net/mlss06au_caetano_gmspr/

there are some really good stuff on this at stanford as well. if you want to look at the open learning lectures for machine learning by andrew. he goes over lots of methods for this kind of thing like stochastic gradient decent etc.. i still have to finish up that series of lectures, but have art to make for school right now. hope this helps...

By dust at 2009-02-01 08:58

To Date

To Date, all the gesture recognition I've seen tracks point motion (mouse cursor or multi-spot like the iPhone). They use fairly simple primitives, like directionality, start/stop points, and some other heuristic stuff that points can do.

With the Wii Remote, we have some other input though: accelerometers. These are more like vectors than 2D points, which might make for some new gesture recognition. (The toolkits I've looked at above, which isn't all of them, only seem to do point/line-based gesturing)

By cwright at 2007-07-18 09:03

acceleration to 2d graph

i see... but it is easy to interpret acceleration data as a 2d graph... so i assume it might work with point based gesturing algos. And when coupling multiple acceleration data, you get multi-dimentional graphs. Here's an interesting tech.paper about getting gestures from the wiimote (at least in terms of concept): http://www.ailive.net/papers/LiveMoveWhitePaper_en.pdf

By franz at 2007-07-19 08:13

interesting

Just got a chance to read this thoroughly after skimming some other gesture recog stuff. While a bit light on the details, it does provide some simple overviews that might help us get started.

(secretly, after playing several games I've decided that gesture recog could make for some incredibly cool human/composition interfaces :)

By cwright at 2007-10-09 14:39

Interpreting

Interpreting acceleration as 1D or 2D is definitely easy (though it's a derivative of a displacement, rather than a displacement, but that's not a big deal).

I was thinking more about what new things are possible that might not be built into current gesture software. Thanks for the link, I'll skim that over :) Games are obviously using these gestures; the question is whether or not we have an existing library to help us, or will we need to extend one or roll our own. I don't know enough to say one way or the other, just a question on the table :)

By cwright at 2007-07-19 09:40

apple's built'in library

can't we use apple's built in library for handling gestures ? KInkGesture? see: http://developer.apple.com/documentation/Carbon/Conceptual/using_ink/ink... and: http://developer.apple.com/documentation/Carbon/Conceptual/using_ink/ink... i'm just curious on how this might be usefull.

By franz at 2007-09-22 04:39

does it support more than letters/numbers?

I've only skimmed the docs (which are pretty cool, BTW... why does apple do so many cool things?!), but it looks like it only supports handwriting for text input? I suppose this could be employed, but it'd only make use of 2 dimensions of input.

By cwright at 2007-09-22 08:19

2D only

yes, it's 2D only. But separating 3d as : 2D + 1D and using conditionning should provide results, tho' it's not a "out of the box" solution, and i guess it could be difficult to interpret wiimote motion with this... And yes again, guys @Apple DO have a bunch of interesting technologies, and i suspect them to have a bunch more under their sleeves (touch-patent rumors...).

BTW, i tryied to make one "simple" gesture recognitionner under QC without any satisfying results (using regular QC nodes, no scripting tho'). If anyone here happen to have a clue on how to achieve this....

EDITED: i just found this: http://faculty.washington.edu/wobbrock/proj/dollar/ and it seems promising. check this out: http://faculty.washington.edu/wobbrock/proj/dollar/dollar.js any javascript guru around ?

By franz at 2007-09-22 12:05

Virtual slider in first !

Have you seen the multitouch pad of the new air book in action ? Gesture is integrated in Apple motion and not so easy to work with...

But this one would be cool :

Preview	Attachment	Size
	virtual circular slider.pdf	117.19 KB

By yanomano at 2008-02-05 17:50

xGestures

There is software called xGestures by Brian Kendall that I have been using since early 10.4. You can get OS wide mouse gestures with visual feedback. I realize that doesn't give you the same level of control that something built into QC would give you, but you could do some duct tape and dental floss work to link most things together.

By dwskau at 2008-03-15 18:18

thx for the link. however, i

thx for the link. however, i don't think it will ever be possible to pipe the wiimote in (since this thread was originally wiimote oriented)

By franz at 2008-03-16 07:03

DTW / Dynamic Time Warping

This one seems very promising, and looks easier than HMM or Levenstein.... http://en.wikipedia.org/wiki/Dynamic_time_warping

By franz at 2008-04-27 11:18

$1 gesture recognizer

just a quick question about this posted link: http://depts.washington.edu/aimgroup/proj/dollar/

Did one of you guys made some efforts in this direction? Anybody tryed to put this in the JavaScript patch? Just asking because that would be a great enhancement for our TUIO based multitouch projects...

By s.rozsa at 2009-01-27 07:46

wow

This looks way simpler than DTW, with comparable accuracy.

I'm not sure what the license is for it... the paper/javascript are easy enough to understand and re-implement. (The name "$1" is some what confusing)

By cwright at 2009-01-27 08:36

Hey Chris.

... how can i encourage you to do that? ;-) Is there a a way doing that? I would love to combine this Javascript patch with the TUIO patch from Reactivision. Regarding the licence: i also did not found anything - but if that is of relevance, i could contact Prof. Jacob O. Wobbrock and ask him about. Far as i can see, there are a lot of different implementations allready (like Java, AS2 etc...) so i think that it is not a legal problem to adapt this for QC...

BTW: do you think that output could be handled dynamically? I mean that "add last stroke as example of existing type" and "add last stroke as example of custom type" stuff from the webpage... That would be really awesome - on that way you could even teach your gestures on the fly (maybe storing the coordinates as an XML file?)...

Cheers,

Sandor

By s.rozsa at 2009-01-30 12:10

This could be a really cool

This could be a really cool thing to use with ar (which can kind of do gesture recognition... or be "faked" to do things like open doors, etc).

By gtoledo3 at 2009-01-30 12:28

Hey George,

...more than that :-) I'm not sure how much you know about the whole MultiTouch scene, but an easy to use and to implement gesture recognition in QC would be an incredibly cool feature for those legasthenic non programmers like me ;-) If you are interrested in some MT-Stuff in QC i would shamelessly recomend to visit our site at xTUIO.com. There are a few examples (pretty cheesy ones) also in QC. The cheesy ones are made by me. The xCode stuff is made by BEn - that is a completly diffrent lique ;-)

If you want to play around: you don't need a MultiTouch table. You can grab the TUIO simulator from reactivision.com (you will need also the TUIO plugin for QC - available also from that site)...

Guys, i would really love to see this feature comming alive :-)

BTW: would that make sense to open a new thread for MultiTouch in this forum? There is not so much done in this field with QC and i belife that QC is predestinated for make MT-Stuff (at least as a solid prototype or for decent visual applications...)

Greetings,

Sandor

By s.rozsa at 2009-01-30 13:01

Cool, I'll check the

Cool, I'll check the multitouch site out.

The TUIO stuff was probably one of the first QC things I ever ran across, because of the simulator being a .jar file... I was doing a web search for stuff, because I used to be into using things other than QC for video fx( I tend to do web searches like this often, just to see what I will find).

I've said it before.... I always feel kind of stupid using the TUIO sim, other than for theoretical testing purposes, because after a minute it seems like I should just attach a mouse patch. However, I do see how cool it could be with an actual table interface, having seen Memo's web clips.

I've been irritated by how TUIO won't seem to work in QC Visualizer, without some modding to the code that I have not figured out... but I'm not sure if the problem is in the renderer code, or if I just have to add the non-Apple plugs that I want into the resources part of the xCode. Visualizer is really weird, in that somethings preview correctly and fart out in full screen, and others just don't work at all. I'm sure this problem is probably documented and I just haven't found the documentation yet, since this has been around for so long.

This recognition works well.

By gtoledo3 at 2009-01-30 13:41

MT Tables...

...are cool, And yeah - memo is doing really nice things. Very creative mind! You can also have a look at a short project showcase of us from last spring if you like: http://vimeo.com/2240537

Cheers,

Sandor

By s.rozsa at 2009-01-30 21:16

g-speak, gesture i/o environment, operating system

i know this is a Multi Touch thread but this is not entirely out of context, seeing it is using gestures for the spatial operating environment.

http://oblong.com/

still got to where some sort of glove ? i really think with machine learning and pattern recognition this can be done without the glove. im pretty close i can move my mouse around and select a directories with just a web cam. it takes a bit to find the sweet spot, not good for anything commercial, but cool for art or interactive stuff where people don't have special gloves, or infrared cameras.

tried to make a version with QC works ok, i will post it soon as i restart. my cam isn't working for some reason want to give the right version.

By dust at 2009-02-01 09:23

You can do that...

...also with other, freely available software. It's quite interresting anyhow. The simplest way would be (imho) to use someting like "touchless" (http://www.officelabs.com/projects/touchless/Pages/default.aspx) . Well this is for Microsoft only but it gives you an idea. Theoretically it would be also possible with BBTouch (the tracker software i'm involved with) by using reflective tape (like that WiiMote stuff) on your fingers and a small IR illuminator placed in the near of your webcam. You could use the data generated by BBTouch with the TUIO patch for QC - and voila, you have a touchless system in Quartz Composer ;-)

Cheers,

Sandor

By s.rozsa at 2009-02-02 14:17

gridLock..

so this is a qc patch you can play with, a optical flow grid with triggers. the triggers are not really doing anything. i think maybe i set up a directory to test. the top row has some buttons. it looks complicated but all the math stuff is so i can calibrate the grid to what ever aspect ratio needed. it would be painful to do this cell by cell- 1 by one for position and w / h. so all the cells have a sprite and a trigger on them so they be used for what ever purpose. i was going to add OSC or kineme midi out so i can play air guitar or something.... there are easier ways to do this with 2d java arrays etc... but i have been doing java gmf latley and am all javaed out, qc is my break.

Preview	Attachment	Size
	gridLock.qtz	174.28 KB

By dust at 2009-02-01 09:39

SpeechSynthesis?

Say Dustin,

Where can the SpeechSynthesisPlugIn be found? I'm good on the rest of the plugins in the composition.

By leegrosbauer at 2009-02-01 11:37

speech synth

yeah the speech synth is buried in there somewhere. it is layer number 40 inside the gridlock. zoom out top right corner of the grid will have a 16 for row 1 col 6 maybe i got it backwards that always got me confused with 2d arrays i like xy notation better but there is a little java script thing there that returns a directory and voice string when you hit the a certain cell. follow the voice wire down half way till you see kineme voice. so its buried in there to the right of the sprite grid. wish there was a way of doing some kind of collapsable node container or something to save space.

this was a interactive commercial thing for the apple store on campus so people walking by would notice something and if they could drive the mouse to the button they could switch products. haven't submitted the thing not in hurry to do some free work for the computer store, they make enough money to pay for advertisements. its buggy anyways if the cursor gets stuck on a cell that changes photos it hikes the index count up and if someone does change the directory the pictures wont show up cause its at index 3000 or something. just got to put some bounds in there. it actually works best if your not moving a cursor and just using a left or right string to change photos instead of a boolean hit trigger.

i got a much different version but i can't get the haar cascade thing in openCV to work with qc. actually have lots of versions of these things. pretty much just doing object recognition, tracking, or blob tracking without the blobs in various programs. the first thing i usually make in a interactive program is a web cam musical instrument was kind of having a hard time in QC cause memos worked pretty good but now i got a version i can actually play like an instrument.

By dust at 2009-02-01 20:23

Thanks, err ....

I was looking for the actual plugin. Then I found it by searching Kineme, but it appears it's a Tiger era plugin and not compatible with Leopard. I installed it anyway but my QC doesn't see it. Not important, I guess. I'll just ignore the missing plugin message and study the composition without it.

It's very nice, btw. I like the cursors. :-)

By leegrosbauer at 2009-02-01 20:51

dev example

With Leopard, there's a developer example that does speech synthesis -- this is why we didn't update the Speech Synth patch for Leopard.

Look in /Developer/Examples/Quartz Composer/Plugins/SpeechSynthesis for the Apple version. (you'll have to build it, but it's pretty easy -- just load up the xcode project, build, and then copy the plugin over).

By cwright at 2009-02-01 20:59

bingo

Thanks, Chris and Dustin. Plugin built and installed. Everything works fine now.

Dustin's comp is quite talkative, it turns out ... unless you sit real still. ahahaha.

By leegrosbauer at 2009-02-01 21:12

example

actually maybe now that i think about it the speech thing is apple, sorry bout that i was thinking about kineme speech recog patch. pretty much all that stuff is just part of the developer example except the grid. the optical flow, the rainbow trails, cover flow, and the mouse trails are part of the examples. like i said that was for a computer store so wanted it generic. you can turn off the bounce to get some more control but found its easy to reverse direction if you bounce sometimes. if you notice the little cursor trails indicate the direction you are moving in.

By dust at 2009-02-02 09:16

kineme beta

are you looking to find the the speech thing to delete or looking for the plug in ? i got the plug in here in the kineme beta section.

By dust at 2009-02-01 20:27

yep

Never had success with it. It lacks a main function, and as i'm no javascript guru... I'd be glad if someone invest a tiny amount of time trying to tinker it....

By franz at 2009-01-27 09:08

Thanks for the head's up on

Thanks for the head's up on that... I could see this working in tandem with some other ideas to make for a very cool level of interactivity.

By gtoledo3 at 2009-01-27 12:12

Re: $1 gesture recognizer

There's now an Objective-C implementation available: http://giraffelab.com/code/GLGestureRecognizer/

By smokris at 2009-05-02 15:31

Re: $1 gesture recognizer

so is it on the todo list ?

By franz at 2009-05-02 19:29

Re: $1 gesture recognizer

not quite yet (we're pretty aggressively booked out for the next little while, but after that we can take a peek)

I forget, can't the JS version be implemented in QC's JS patch (with a bit of modification)?

By cwright at 2009-05-03 07:15

Re: $1 gesture recognizer

Ohhhh, that makes so much sense that could be done...

Been so long since I've looked at this thread, need to comb through and look for the relevant stuff...

By gtoledo3 at 2009-05-04 01:14

Re: $1 gesture recognizer

The JS implementation should be doable, although to original JS source is hardcore.

This new ObjC implementation is pretty much readable and looks über-clean. IMO, much much better. However, there are only 2 gestures included (line and circle).

By franz at 2009-05-04 05:02

Make me happy...

...hope that i'm not going OT with this, but it has to do with Gestures and (somehow) also with Gesture Recognition. If i'm going to far with that, so please give me a hint and i will post a new Feature Request...

Speaking about Javascript etc... There is a very nice Javascript by Tom Robinson called virual lighttable. Along with some hints and links on his website. In desktop browsers it uses the previous clicked location as a second “touch”, so you can click a photo then click and drag another spot on the photo to resize and rotate (notice the yellow dot). If somebody would be able to implement this javascript in a QC patch (preferably with a REAL second, third, ... touch from the TUIO patch!) - well that would ease up the whole MultiTouch work in QC...

Her's the link anyhow: http://tlrobinson.net/blog/2008/07/11/multitouch-javascript-virtual-ligh...

Anyone an idea?

Cheers,

Sandor

By s.rozsa at 2009-02-02 14:29

Hmm, I was looking at a

Hmm, I was looking at a video of this awhile back and thinking about how easy it seemed to do, but then again, I was envisioning just "gliding" the finger across the document to make it bigger or smaller. The part about the clicking and previous location got lost on me when I saw the clip.

By gtoledo3 at 2009-02-02 14:54

MSR

well generally we are speaking about three actions when it comes to such a type of MultiTouch application (mainly if you are making a media presenter kind of app.): move, scale and rotate. This is that iPhone like MT which started a new hype with this cool tech. Therefor you are needing "gestures". So basically a "pinch" type gesture for scaling, a rotate around type gesture and a simple drag gesture. The simple drag is achived (quess how) by simply putting your finger on the object (picture, movie whatever...) a draging that around. For the two other gestures you would need at least two fingers. In that case you are using one finger as a reference point for "pinching" (a diagonal movement) or for the rotation center. You could of course use more than two fingers placed simultanously ove the object, calculate a midpoint and perform the pinch/rotate action around that point - this is more difficult to do than with two fingers, but in my experience with current applications it would be more user friendly (users tend to do that intuitively with more than two fingers...)

The javascript app i have posted here is actually "faking" this mutitouch operation(s) with only one mouse cursor. have to admit that even that is quite impressing - but i think that using the behaviour (and the math etc.) in a JS patch for "true" multitouch via TUIO or OSC would be a real killer thingie :-)

By s.rozsa at 2009-02-02 17:23

more input

Was reading a QuickTime blog (of all things!), and came across a link to this: http://wiigee.sourceforge.net/download_files/gesture_recognition_with_a_...

wiimote-based gesture recognition paper. I'll have to grok this, and factor that into our eventual QC-GR plugin...

By cwright at 2009-02-12 21:58

cool idea

cwright wrote:
Was reading a QuickTime blog (of all things!), and came across a link to this: http://wiigee.sourceforge.net/download_files/gesture_recognition_with_a_...

wiimote-based gesture recognition paper. I'll have to grok this, and factor that into our eventual QC-GR plugin...

Hey Chris,

that sounds very interresting. I think it would be a good idea to have a kind of "generic" QC-GR plugin with the abily to connect whatever input device the user wants... I'm not an expert in developing things but i think that most of the input devices are sending some kind of structured data. So if you specify an input protocoll for the plugin (like OSC or TUIO) it can be used with more than one specific device (not only mouse, tablet or WiiMote). Ideally the plugin would accept a lot of different input options (like input from the accelerometer etc.) but it would have some basic "general" minimal input like a 2d x/y input.

Hope it makes sense what i'm writing here :-)

I would love to discuss this with you more intensively if you don't mind... If you like please contact me via mail: rozsa (at) cd-cologne (dot) de ... Maybe we can setup something?

Cheers,

Sandor

By s.rozsa at 2009-02-14 05:19

structured

The only data provided are dimensional values (i.e. x value is ___, y value is ____, z value is ____, for each "dimension"; mouse is 2D (x, y), space navigator is 6D, wiimote can be 14-D (x/y/z of wiimote, x/y/z of nunchuk, 4 x/y IR points)). We also have the time axis -- so all in all, we get a bunch of curves in space over time, and we try to match those against trained curves. Recording them isn't a big deal, but N-dimensional curve matching is tricky to do quickly. $1 works very well for 2D, but doesn't scale above that (one of its first steps is to rotate the x/y data to align it with training data, but that doesn't make sense if there are more than 3 dimensions [I know you can rotation in arbitrary numbers of dimensions with trig, the problem is that finding proper alignment isn't as simple, and rotation data can be important for 3D+ gestures].

So the input protocol would look at lot like the spline patch, or value historian: in the inspector, you specify how many dimensional inputs, and then on the patch you hve some structure data for trained curves (maybe?), and a bunch of ports to plug input values into. over-engineering it to require OSC or TUIO is silly (and tuio is behind so many crash reports that I personally wouldn't even consider it for production work without knowing how to avoid those pitfalls) -- again, recording N sets of data is trivially easy. It's the comparison that's difficult.

By cwright at 2009-02-14 07:24

so non structured?

Chris, as i said that before: i'm not the developer in this round ;-) So sorry if i'm using the wrong terminology.

But i hope that i'm getting the point now. So basicaly you would recieve N dimensional inputs of data - right? This would be evaluated against prerecorded datasets (or trained curves?). The comparison would be the difficult part as you are saying.

Sorry - know that i have a maybe too much "for WHAT will be this feature be REALLY used" approach. And i have to admit that i'm very MultiTouch oriented in my thinking. But would not be a good thing to start with something relatively easy like a 2D thingie? On that way, a few devices like mouse, tablet, trackpad and yes, also some MT-Devices like our MultiTouch table could be used... Well even that $1 thing would be a great step (imho) - and i know that it has some limitations: if you are using that for a MT table, you will have a hard time if you try to implement that for users acting from different sides of the table... But hey! Who and what is perfect! But i'm affraid again that i'm too egoistic in my thinking:-)

Regarding TUIO: what crash reports do you are referring to? Ist it the QC TUIO patch or generally the protocoll you are talking about? We are using TUIO in production work (NOT the QC client, but we relay on the protocoll for the communication between the tracker and the clients) and till now we did not experienced any major pitfall... BTW: Martin and some other guys are working on the TUIO 2 specs atm. I'm not married with TUIO but it is nice to have a cross plattform thing you can use regardless if you are working with Flash, Java, OF, QC or whatever if you are building your client software. And as you are saying: it is not a necessary feature for the Plug to use one of this protocolls...

My2c.

Greets,

Sandor

By s.rozsa at 2009-02-14 13:08

more notes

You're right -- we'd receive N inputs, some trained curves, and output the match and confidence, or a set of matches and confidence. (confidence = how "good" the match is)

When speaking of tuio, I meant the QC plugin -- apologies there, I'm sure it's fine handled elsewhere. Just the QC one seems to have some quirks (both the java port that we did a long time ago, and the updated version)

I'd rather not make a 2D-only gesture recognizer, and then have to make a completely separate 3D/nD one that isn't similar/compatible. 2D is easy enough if $1 fulfills your needs, but having multiple "almost-similar" patches is annoying, imo. Once a variation on $1 is figured out, and generalized to nD, we're good to go.

(note: MT isn't too high on my priority list personally, which is part of why I'm not actually coding anything for this yet, just learning...)

By cwright at 2009-02-14 13:55

dimensional

i think making a gesture recognition patch is a good idea. making it with arbitrary inputs is the way to go. nothing against tuio, cause it is a cool protocol but people might actually want to match other gestures maybe even images etc... so you could do the all in one kind of patch but it might even be better to make a suite of gesture tools. lets say a HMM patch, filter pass patch (kind of like smooth), a generic training set patch or something, then the match with confidence etc... so the user can put them together how ever they want. if they want to use RGB, XYZ, ABC etc....i know that personally would be more useful. HMM can be used for lots of cool things.

By dust at 2009-02-14 20:21