Sketch Recognition: February 2013

Wednesday, February 27, 2013

Protractor: A Fast and Accurate Gesture Recognizer

Published in 2010 and written by Yang Li, this paper presents a fast, flexible, and simple template-based recognizer that calculates the distance of a given gesture to the template set using an angular based metric.

Template based recognition lends itself well to situations where users create their own personalized gestures since the recognition is purely data driven and feature agnostic, making it flexible and responsive to the users provided input.

Li describes geometric and feature based recognizers as being parametric, complex, and inflexible methods that are unresponsive to input. This observation is not without merit and such recognizers do tend to inflexibility cause users to have to conform to the recognizer and existing training data that he recognizer is running off of.

Li also cites that users do not want to provide multiple instances of their gesture, hence having the flexibility to be able to accurately recognize with small amounts of input is paramount.

Interesingly, Li notes that Protractor can be either rotation invariant or rotation senstive which is not an option that I have ever seen offered in other recognizers where you're usually stuck with one mode or either. This choice is nice since some gestures aren't recognizable if you're stuck on one side of it.

Preprocessing
Like $1, Protractor resamples the points using 16 points and translates the centroid of the gesture, (Xavg, Yavg), to origin, (0,0). Unlike $1, Protractor then enters into a noise reduction phase for gesture orientation. If the recognition is set to be rotation invariant, the gesture is rotated so that the indicative angle, or the angle between the origin and starting point, is zero. Otherwise, the gesture is rotated to the nearest 8-way base orientation.

With the gestures processed for invariance, the 16 points make a vector used to calculate inverse cosine distance between gestures through a closed-form solution that approximates the minimum angular distance. This closed-form solution is much quicker than finding the optimal solution with an iterative process.

Study
Yi ran his Protractor recognizer against the $1 recognizer on a set of 4800 samples for 16 gestures and found that their recognition rates were similar. However, the time to recognize was much smaller for Protractor.

Yi also studied the effect of orientation sensitivity (invariant, 2 way, 4 way, 8 way base orientations) on error rates and found that 8 way was significantly less accurate that the other 3 sensitivity levels due to noise in the data.

Yi also points out that his recognizer requires 1/4 the memory space that $1 does which is important for the mobile devices that the recognizer would probably be used on.

I certainly wouldn't have come up with the idea to make a template recognizer based an angle based distance. I never really thought about the implications of designing recognition schemes around the target platform (mobile) but it makes sense and I can see how Protractor is strong in that respect.

I'd like to see how different kinds of data sets would affect Protractors recognition rates and performance compared to $1...

Yang Li. 2010. Protractor: a fast and accurate gesture recognizer. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '10). ACM, New York, NY, USA, 2169-2172. DOI=10.1145/1753326.1753654 http://doi.acm.org/10.1145/1753326.1753654

Thursday, February 14, 2013

PaleoSketch: Accurate Primitive Sketch Recognition and Beautification

In this paper, Paulson and Hammond develop a free-form sketch recognition system called, "PaleoSketch" that very accurately (98-99%) and geomerically recognizes the following low level primitives

Lines
Polylines
Circles
Ellipses
Arcs
Curves
Spirals <= Unique feature!
Helixes <= Unique feature!

After recognition, the user is presented with a disambiguation menu to select from the shapes that fit the primitive test conditions. Once the user has selected their intended shape, the sketch is "beautified" by removing the actual user strokes and replacing it with a Java2D shape. The primitives recognized by the lower level PaleoSketch recognizer are restricted to single strokes. The PaleoSketch system also can recognize combinations of these primitives through a higher level hierarchical recognizer similar to other systems in the field.

The primary focus of PaleoSketch was to place the least amount of restrictions on the user and have the system conform to the user rather than the other other way around. As was mentioned in my previous post, the geometric recognizers have the benefit that recognition is not as heavily dependent on how the user drew the sketch as in features/gesture based recognizers. So, staying in line with their goals of supporting user-independent recognition, PaleoSketch primitive recognition is geometric.

Paulson and Hammond also cite that their use of the corner recognition and multistage recognition scheme (pre-recognition, primitive/shape recognition, beautification, and higher-level/hierarchical recognition [See Figure 1] ) was largely influenced by Sezgin et al.'s work on the SSD recognizer. Yu and Cai's work was also a major contributing inspiration, particularly as it related to corner recogniotion and a feature area error metric.

The Recognition Stages

In pre-recognition, PaleoSketch...

Removes consecutive and duplicate points
Constructs direction, speed, curvature, and corner graphs
Calculates NDDE (Normalized Distance between Direction Extremes) <= New feature!

curved stroke and polyline discrimination

Calculates DCR (Direction Change Ratio) <= New feature!

curved stroke and polyline discrimination

Removes tails
Tests for overtracing
Tests for closed figures

In primitive/shape recognition, PaleoSketch...
does the Shape Tests for the primitives listed above. If a shape doesn't meet pass any of the shape tests, it is classified as a 'complex shape'. After this classification, the shape is broken into substrokes and recombined to see if any of the constituent combinations can be redefined as a primitive shape.

Then in beautification, PaleoSketch...
beautifies the stroke by returning the Java2D shape.

Finally, in higher-level/hierarchical recognition, PaleoSketch...
ranks/scores the primitive shapes and which contribute to a classification system that determines whether multiple strokes should be identified as polylines, curves, or 'complex' shapes. Complex shapes are reanalyzed for tails. Then the identified tails are removed and retested to determine if they can now be recognized as shapes. If they pass, they're reclassified as shapes. Otherwise, they remain complex shapes.

In the accompanying study, the full Paleo recognizer was tested against

a version of Paleo without the two new features (DCR and NDDE)
a version of Paleo without the ranking
the SSD recognizer

The results clearly show that the full version of Paleo is the most accurate with the DCR and NDDE features making a very significant contribution giving the top interpretation and a lesser, but still major, contribution toward giving the correct interpretation.

The PaleoSketch multi-stage recognition in combination with a clearly thought out set of features and shape tests makes the paleo-recognizer a very accurate and impressive system. The examples that were given of of cases where the recognizer failed are clearly areas where people probably would have failed to classify them also which I think makes an interesting point all in itself: that recognizers can't always (100%) understand what any given user intends to convey or draw; People are, frankly, imperfect. Put another way, "To err is human."

The inclusion of so many thresholds in the paper feels disconcertingly arbitrary, however, and I would have liked to have seen some more discussion on why those thresholds were chosen and if they are still valid for under a broad range of domains and applications.

Brandon Paulson and Tracy Hammond. 2008. PaleoSketch: accurate primitive sketch recognition and beautification. In Proceedings of the 13th international conference on Intelligent user interfaces (IUI '08). ACM, New York, NY, USA, 1-10. DOI=10.1145/1378773.1378775 http://doi.acm.org/10.1145/1378773.1378775

Tuesday, February 12, 2013

What!?! No Rubine Features?: Using Geometric-based Features to Produce Normalized Confidence Values for Sketch Recognition

At Texas A&M University's [whoop!] Pattern Recognition and Intelligent Sensor Machines Lab, Paulson et al. developed a hybrid gesture and geometric recognizer that attempted to provide normalized confidence values for free form sketches. The difficulty in recognizing free form sketches is that the user is not trained on how to actually draw the gestures so that the sketch is more palatable for the computer. Instead, the recognition must be flexible enough to recognize a 'geometric primitive' (that is a predefined shape whose properties are defined geometrically) in such a way that the way that it is drawn doesn't matter. In other words, the system must be able to recognize figures in a way that is rotation invariant, scale invariant, stroke direction invariant, etc.

By combining geometric and gesture recognition schemes, Paulson et al. get the invariance properties of geometric based recognition as well as the statistical, mathematically sound classification and confidence values of gesture based recognition. The invariance properties also allowed Paulson et al.'s system to achieve a certain level of user and style independence that makes the recognizer more flexible and robust to end users.

Paulson et al. employed commonly used and cited 13 Rubine (gestural) features as part of their recognition. More interestingly, Paulson et al. also used a set of 31 (geometric) features from Paulson and Hammond's previous work on the PaleoSketch recognizer.

Finally, Paulson et al's system uses a quadratic statistical classifier that uses the gestural and geometric features to classify the gestures.

Paulson et al's accompanying study aimed to validate their hybrid approach and verify that the quadratic statistical classifier could produce recognition rates comparable to their very accurate PaleoSketch system.
With 1800 sketches from 20 test subjects (90 each), 900 were used for choosing which features produced the most accurate recognition results and the other 900 were used for the validation stage of their experiment.

Using the 50/50 split data (also used in PaleoSketch), the hybrid recognizer produced an 86% accuracy rate using all 41 features, while Paleosketch itself produced a 98.56% recognition rate.

Falling short of PaleoSketch, Paulson et al attempted to find the optimal set of features to use for their hybrid recognizer.
So, Paulson et al. used 10 folds of a greedy sequential forward selection technique to select optimal features from a 50/50 user split of the training data. This produced 10 subsets of features. Paulson et al then for i = 1 to 10, add a subsets that used features that were present in i*10 percent of the time in the original 10 subsets.
To narrow it down to the optimal feature subset from these 20 feature subsets, 25 folds of classification were run across a 50/50 user split on the training data on each subset.

This final subset included the following features:

Endpoint to stroke length ratio
NDDE - Normalized Difference Between Direction Extremes
DCR - Direction Change Ratio
Curve Least Squares Error
Polyline Fit: # of sub-strokes
Polyline Fit: % of sub-strokes pass line test
Polyline Feature Area Error
Circle Fit: major axis to minor axis ratio
Spiral Fit: Avg. radius/bounding box radius ratio
Spiral Fit: Center Closeness Error
Complex Fit: # of sub-fits
Complex Fit: # of non-polyline primitives
Complex Fit: % of sub-fits that are lines
Complex Score / rank
Total Rotation

The only gestural feature determined to be an optimal feature

With this optimal feature subset, the hybrid recognizer achieved a 97.44% accuracy rate with the 50/50 split data and a 96.45% accuracy rate for the 25-fold with random 50/50 user splits.

Using only the top 6 features resulted in a recognition rate around 93% which is very high and indicates that those features highly independent and characterize the gestures well.

It is interesting to note that all of the optimal features were geometric features. Paulson et al. did note that their dataset was biased toward user-independent features which are typically geometric since they split their data to separate training data from verification data based on users.

Personally, I think combining the hybrid and gestural approach was a great idea specifically to get the invariance properties of the geometric recognition. Their study also did a great job of advancing the state of the art to focus on higher level combinations of figures that I may be able to take advantage of to produce a kanji recognition system that uses kanji radicals as the base geometric primitives.

Cited Work

A. Chris Long, Jr., James A. Landay, Lawrence A. Rowe, and Joseph Michiels. 2000. Visual similarity of pen gestures. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems (CHI '00). ACM, New York, NY, USA, 360-367. DOI=10.1145/332040.332458 http://doi.acm.org/10.1145/332040.332458

Brandon Paulson and Tracy Hammond. 2008. PaleoSketch: accurate primitive sketch recognition and beautification. In Proceedings of the 13th international conference on Intelligent user interfaces (IUI '08). ACM, New York, NY, USA, 1-10. DOI=10.1145/1378773.1378775 http://doi.acm.org/10.1145/1378773.1378775

Friday, February 8, 2013

Visual Similarity of Pen Gestures

In this paper, Long et al. undertook an ambitious mission acting as explorers and cartographers to map the human perceptual space to a computational model for gesture similarity.

Citing how gestures iconic nature lends themselves to memorability, Long et al. begin the paper by emphasizing the benefits and widespread use of gestures for UI control. Unfortunately, gestures are also difficult to design due to three main problems:

A gesture may be difficult to recognize computationally
A gesture may appear to be too similar to another gesture by the users making it difficult to remember
A gesture may be difficult to learn or remember

Exploring past work on human perceptual similarity, Long et al. note that

The logarithm of quantitative metrics correlate with similarity
If range of differences in gestures is small, the differences are linearly related to perceived similarity
The same person may use different metrics for similarity for different gestures

Long et al. ran two experiments.

In the first, a diverse gesture set was created that varied largely along multiple features and orientations.

In the second, the gesture sets were divided into several categories that features similar gestures varying along a particular features.

For both experiments, the participants were given display tablets with pens and were shown all possible combinations of three gestures at a time (called triads) and were told to choose the gesture that was most dissimilar. The results were then analyzed

to determine what measurable geometric properties of the gestures influenced their perceptual similarity

Measured by MDS (Multi-Dimensional Scaling) with dimensions 2 through 6. The best dimensionality was determined by stress and goodness-of-fit (r^2)
The distance between points were the reported dissimilarities given by the participants
Large distances between points along a dimensions means that the corresponding geometric property is the greatest determinant of similarity/dissimilarity

produce a model of gesture similarity that, given two gestures, could predict how similar people would perceive those gestures to be

Measured by running regression analyses to determine which geometric features correlated with reported similarity/dissimilarity

Weights correspond to force of contribution to similarity for the feature

In addition to Rubine's features, Long also uses the following features as candidates for similarity:

Long Feature 14

Aspect

abs( 45° - angle of the bounding box (RubineFeature4) )

Long Feature 15

Curviness

Long Feature 16

Total Angle Traversed / Total Length
Rubine Feature 9 / Rubine Feature 8

Long Feature 17

Density Metric 1

Total length / distance between first and last points
Rubine Feature 8 / Rubine Feature 5

Long Feature 18

Density Metric 2

Total length / Length of diagonal of bounding box
Rubine Feature 8 / Rubine Feature 3

Long Feature 19

Non-Subjective Openness

distance between first and last points / Length of diagonal of bounding box
Rubine Feature 5 / Rubine Feature 3

Long Feature 20

Area of Bounding Box

MaxX - MinX * MaxY - MinY

Long Feature 21

Log(area)
Log( Long Feature 20 )

Long Feature 22

Total angle / total absolute angle
Rubine Feature 9 / Rubine Feature 10

Long Feature 23

Log( total length )
Log( Rubine Feature 8 )

Long Feature 24

Log( aspect )
Log( Long Feature 14 )

The results from the first experiment showed that curviness, Log(aspect), total absolute angle, density 1, and to a lesser extent, angles between first and last points, initial angles and distance between first and last points were important features for distinguishing between features from the first set.

The results from the second experiment also found that Log(aspect), density 1, and total absolute angle were important features for discrimination between gestures. Additionally, Long et al. found that figures with horizontal or vertical orientations were perceived as more similar than figures with diagonal orientations.

The predictive model derived form experiment 1 had slightly more predictive power then the model derived from experiment 2.

Long et al end their paper by noting that their exploration of the human perceptual space is not exhaustive since that space has not yet been completely explored (my opinion: or may not even be entirely knowable).

Cited Work
A. Chris Long, Jr., James A. Landay, Lawrence A. Rowe, and Joseph Michiels. 2000. Visual similarity of pen gestures. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems (CHI '00). ACM, New York, NY, USA, 360-367. DOI=10.1145/332040.332458 http://doi.acm.org/10.1145/332040.332458

"Those Look Similar" Issues in Automating Gesture Design Advice

Published in 2001, Long et al's paper, as suggested by the title, discusses issues found related to the timing and content of feedback on gesture design advice within the context of their program, "Quill."

Quill is a gesture design tool that allows its users to create gesture sets and determine their computational and visual similarity so that the gestures can be accurately recognized as well as easily remembered. Quill attempted to use 'unsolicited advice' to warn the gesture designer that their gesture was too hard to recognize so that the designer to take steps to correct their gestures.

The problems with the unsolicited advice were broken down into three categories:

Timing

When to show the advice

As soon as problem is detected

Benefits:

Can (potentially) fix the problem as soon as it occurs

Drawbacks:

Interrupts ideation and focus on task
Advice may no longer be relevant later

=> Confusion from Incorrect Advice

May be ignored anyway

Delayed until designer is testing the gesture set

Benefits:

Won't produce advice that is incorrect / no longer relevant

Drawbacks:

Won't see the problems until testing
Won't see what caused the problems as it happens

When to remove the advice

When told to be ignored by the user

Trivial

When the problem is fixed

"Fixed" is not well defined

How dissimilar do gesture need to be to be unambiguous and easy to remember?

Need to evaluate whole gesture set when a change occurs

Volume

How much advice to give

Concise Info First, Detail-on-demand

Content

Expert User:

Less info
automatically generated

a little more cryptic

Beginner:

Links to more detailed info
handcrafted information, examples, and illustrations

Long et al. also needed to determine how to perform the more computationally heavy analysis in the background of the application. Since the Quill tool attempts to display the analysis and advice info in the GUI dynamically and because the content of that information depends on the analysis, the policy on when to perform the analysis affects the user's mental model of the nature of and the connection between gesture form and gesture identifiability.

5 schemes for analysis computation were devised:

Lock all user actions during advice computation

Benefits: Easy to implement and understand
Drawbacks: Frustrates user with delays

Disable any action that affects the advice <= Used for User Initiated Analyses

Benefits: Distinct separation between action and effect on analysis
Drawbacks: Frustrates user with delays; potentially confuses users who don't see the connection between the action and the effect

Allow any action but cancel if it affects the advice <= Used for System Initiated Analyses

Benefits: Some freedom
Drawbacks: User can inadvertently cancel the calculation and cause delays and must determine when to restart calculation

Allow all actions

Benefits: High User Freedom
Drawbacks: Potentially incorrect advice

Disable user actions that would change state in use by current analysis

Benefits: Most efficient in that it allows non-conflicting operations to occur
Drawbacks: High confusion because of the differing availability of options for different sets of gestures

Long et al. also noted that having error in the advice either greatly confused the gesture designers or caused them to ignore the advice entirely. Therefore, correctness of advice and quality of similarity metrics is a priority.

Personal thoughts:
Quill's unsolicited advice system sounds like a nice idea, but it reminds me a lot of the 'clippy' characters that Microsoft tried to advocate in the early to mid 1990s. People don't like being interrupted and they certainly don't like being given obvious or unhelpful advice. In this respect, the admittance at the end of the character that there the similarity metric challenges produced negative results in the user groups was a little underwhelming despite how important the observation may be.

Long et al. made an interesting bullet point on their list of future work when they said that they wanted, "To (partially) automate the repair of human similarity and recognition problems by morphing gestures."
This to me seems like the best solution to their problem of giving clear feedback on what constitutes a good gesture set. Providing an intelli-sense menu of possible transformations may have been useful.

Cited Work
A. Chris Long, James A. Landay, and Lawrence A. Rowe. 2001. "Those look similar!" issues in automating gesture design advice. In Proceedings of the 2001 workshop on Perceptive user interfaces (PUI '01). ACM, New York, NY, USA, 1-5. DOI=10.1145/971478.971510 http://doi.acm.org/10.1145/971478.971510