At Texas A&M University's [whoop!] Pattern Recognition and Intelligent Sensor Machines Lab, Paulson et al. developed a hybrid gesture and geometric recognizer that attempted to provide normalized confidence values for free form sketches. The difficulty in recognizing free form sketches is that the user is not trained on how to actually draw the gestures so that the sketch is more palatable for the computer. Instead, the recognition must be flexible enough to recognize a 'geometric primitive' (that is a predefined shape whose properties are defined geometrically) in such a way that the way that it is drawn doesn't matter. In other words, the system must be able to recognize figures in a way that is rotation invariant, scale invariant, stroke direction invariant, etc.
By combining geometric and gesture recognition schemes, Paulson et al. get the invariance properties of geometric based recognition as well as the statistical, mathematically sound classification and confidence values of gesture based recognition. The invariance properties also allowed Paulson et al.'s system to achieve a certain level of user and style independence that makes the recognizer more flexible and robust to end users.
Paulson et al. employed commonly used and cited 13 Rubine (gestural) features as part of their recognition. More interestingly, Paulson et al. also used a set of 31 (geometric) features from Paulson and Hammond's previous work on the PaleoSketch recognizer.
Finally, Paulson et al's system uses a quadratic statistical classifier that uses the gestural and geometric features to classify the gestures.
Paulson et al's accompanying study aimed to validate their hybrid approach and verify that the quadratic statistical classifier could produce recognition rates comparable to their very accurate PaleoSketch system.
With 1800 sketches from 20 test subjects (90 each), 900 were used for choosing which features produced the most accurate recognition results and the other 900 were used for the validation stage of their experiment.
Using the 50/50 split data (also used in PaleoSketch), the hybrid recognizer produced an 86% accuracy rate using all 41 features, while Paleosketch itself produced a 98.56% recognition rate.
Falling short of PaleoSketch, Paulson et al attempted to find the optimal set of features to use for their hybrid recognizer.
So, Paulson et al. used 10 folds of a greedy sequential forward selection technique to select optimal features from a 50/50 user split of the training data. This produced 10 subsets of features. Paulson et al then for i = 1 to 10, add a subsets that used features that were present in i*10 percent of the time in the original 10 subsets.
To narrow it down to the optimal feature subset from these 20 feature subsets, 25 folds of classification were run across a 50/50 user split on the training data on each subset.
This final subset included the following features:
- Endpoint to stroke length ratio
- NDDE - Normalized Difference Between Direction Extremes
- DCR - Direction Change Ratio
- Curve Least Squares Error
- Polyline Fit: # of sub-strokes
- Polyline Fit: % of sub-strokes pass line test
- Polyline Feature Area Error
- Circle Fit: major axis to minor axis ratio
- Spiral Fit: Avg. radius/bounding box radius ratio
- Spiral Fit: Center Closeness Error
- Complex Fit: # of sub-fits
- Complex Fit: # of non-polyline primitives
- Complex Fit: % of sub-fits that are lines
- Complex Score / rank
- Total Rotation
- The only gestural feature determined to be an optimal feature
With this optimal feature subset, the hybrid recognizer achieved a 97.44% accuracy rate with the 50/50 split data and a 96.45% accuracy rate for the 25-fold with random 50/50 user splits.
Using only the top 6 features resulted in a recognition rate around 93% which is very high and indicates that those features highly independent and characterize the gestures well.
It is interesting to note that all of the optimal features were geometric features. Paulson et al. did note that their dataset was biased toward user-independent features which are typically geometric since they split their data to separate training data from verification data based on users.
Personally, I think combining the hybrid and gestural approach was a great idea specifically to get the invariance properties of the geometric recognition. Their study also did a great job of advancing the state of the art to focus on higher level combinations of figures that I may be able to take advantage of to produce a kanji recognition system that uses kanji radicals as the base geometric primitives.
Cited Work
A. Chris Long, Jr., James A. Landay, Lawrence A. Rowe, and Joseph Michiels. 2000. Visual similarity of pen gestures. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems (CHI '00). ACM, New York, NY, USA, 360-367. DOI=10.1145/332040.332458 http://doi.acm.org/10.1145/332040.332458
Brandon Paulson and Tracy Hammond. 2008. PaleoSketch: accurate primitive sketch recognition and beautification. In Proceedings of the 13th international conference on Intelligent user interfaces (IUI '08). ACM, New York, NY, USA, 1-10. DOI=10.1145/1378773.1378775 http://doi.acm.org/10.1145/1378773.1378775
No comments:
Post a Comment