Sketch Recognition: March 2013

Sunday, March 24, 2013

The Word-Gesture Keyboard: Reimagining Keyboard Interaction

In this paper, Zhai and Kristensson discuss their Word-Gesture Keyboard which allows users to drag their finger in a single stroke across a touch screen keyboard to input a word. Their keyboard uses a probabilistic model to provide a possible-word menu from which the user can select their desired input.

Zhai and Kristensson first describe the requirements for the widespread acceptance of a new text input method:

High Speed of Input
Minimal Cognitive Load
Support for the Development of Efficiency

From there, the researchers discussed the feasibility of a gesture based text input method. They note that English, like nearly all natural languages, feature a high degree of regularity and redundancy and that the average word length is only about 4 to 5 characters. Longer more complex words can still be entered through the striking method so all words can be entered by users despite their particular skill or accuracy in using the keyboard. Essentially, SHARK (short-hand aided rapid keyboarding) is really an improvement of the existing keyboard. The probabilistic recognition of matching words in the English lexicon also reduces the impact of inaccurate input caused by the fat finger problem.

Since the user doesn't have to lift their finger, the input is generally takes less time than tapping.
Their keyboard input method does not require any training which is a must for any new text input method. Since most input is done for a small set of commonly used words, these common words' input paths are easily remembered through procedural/muscle memory which further improves input speed. This training allows users to progressively become more efficient which is a positive mark of a "progressive user interface."

Their keyboard also has features common system commands...

to change the case of the word before the caret to further decrease time to input and minimize user frustration.
to copy and paste
to switch languages (lexicon search and different character input sets)

Zhai and Kristensson performed several experiments to test...

recall rates without visual feedback (looking at the keyboard) across multiple sessions

resulting in about 13 words remembered for the 1st session to 45 to 57 words remembered for the 4th session

entry rates (speed) for initial performance with varying amounts of time to practice on QWERTY and ATOMIC keyboards

resulting in 7 to 13 wpm for ATOMIC and 15-25 wpm for QWERTY

maximum performance for small set of text inputs recorded in the form of a competition

with an average error-free performance of 57.5 wpm and top performance of 99 wpm

A subjective evaluation was also conducted with responses and comments gathered from their official release with the Apple App Store for the iPhone. Responses were 81.6% completely positive, 12.5% somewhat positive, and 5.9% completely negative.

Given the limited input space touch screens for mobile devices, it is impossible to use the standard two handed typing for which the DVORAK keyboard was designed. As such it makes sense to explore less traditional methods of keyboard input and to attempt to ease input for these devices, especially in light of their widespread use. Zhai and Kristensson's work is a great work that does just that with the grounded scientific work and forethought given to it that such work deserves.

Shumin Zhai and Per Ola Kristensson. 2012. The word-gesture keyboard: reimagining keyboard interaction. Commun. ACM 55, 9 (September 2012), 91-101. DOI=10.1145/2330667.2330689 http://doi.acm.org/10.1145/2330667.2330689

A Few Useful Things to Know About Machine Learning

In this article, Pedro Domingos attempts to reveal the "Folk Knowledge" and "Black Art" commonly used in machine learning.

First Domingos provides some key insights about machine learning:

Machine learning allows the program to generalize rules from examples. This saves programmers time by not having to explicitly program rules. Cost savings are increased for larger data sets.
Using machine learning techniques successfully requires information not easily found in formal information sources. i.e. "Black Art" is abundant in the field.

These insight entailed in this "Black Art" is revealed by Domingos in 12 lessons:

1st Lesson: Learning = Representation + Evaluation + Optimization:

Representation

Set of Classifiers - Hypothesis Space

The set of possible classes that the classifier can assign input to

Evaluation

Evaluation Function / Scoring Function

Scores the input and uses that score to classify

Optimization

Optimizer chooses the best scoring and efficient classifier (or way to classify) for the data

2nd Lesson: It's Generalization that Counts

Training data should not be used as test data as it 'contaminates' the test data and provides an unrealistic and overoptimistic view the classifier's performance. Optimizing the classifier for the training data can lead to overfitting/overspecialization. Instead the classifier should be generalized to be effective for a larger, more diverse set of data. We don't know the function that the learning algorithm is attempting to optimize; we're instead intuiting and learning that function from the data.

3rd Lesson: Data Alone Is Not Enough
It is often impossible to know and store every possible instance of all the classes. Without any information about that data, random guessing is the best way to classify. This is summed up by the "no free lunch" theorem by Wolpert which says that no learner can beat random guessing over all possible functions to be learned. Instead, the learner/classifier must make some assumptions about how that data looks like. Knowledge is used as a lever for predictive power. Learners combine knowledge with data to "grow" programs

4th Lesson: Overfitting Has Many Faces
Overfitting is a generalization error that comes from bias or variance in the data. Bias is the learner's tendency to consistently learn the same wrong thing. Variance is the tendency to learn random things irrespective of input signal. Different types of classifiers can have different behavior with respect to bias or variance and it is often a tradeoff. We can counteract overfitting with cross-validation or by adding a regularization term to prefer classifier with less structure and complexity. Significance tests can combat the problem of multiple testing and/or minimizing the false discovery rate. Similarity based reasoning and common patterns break down for data sets with high dimensionality.

5th Lesson: Intuition Fails in High Dimensions
Humans are unable to see patterns in data with high dimensionality which is part of the reason by machine learning is so useful, however, the same difficulty makes it harder to craft features for the classifier and successfully make generalization and assumptions about the data.

6th Lesson: Theoretical Guarantees Are Not What They Seem
The number of examples needed to generalize in order to make solid claims on the effectiveness of classifier is bounded, however in induction these guarantees are probabilistic and require many more examples than in deduction. Furthermore, these bounds do not inform us on how to construct a good set of classes for the hypothesis space. Theoretical guarantees is a source to inform good algorithm design but may not be indicative of the predictive power and effectiveness of the classifier.

7th Lesson: Feature Engineering is the Key
Understanding the domain and taking the time to gather good data in a large enough volume and adequately analyzing it is often the most time-consuming and important part of machine learning. Automating feature generation is possible, but often poorer in quality to hand crafting.

8th Lesson: More Data Beats a Cleverer Algorithm
Gains in using a more complex and accurate classifier are minimal and larger gains can be obtained with the same amount of effort in getting more data. Start simple and go more complex and refine if needed. Fixed size learners can only handle so much data.

9th Lesson: Learn Many Models, Not Just One
Often a combination of classifiers through bagging, boosting, or stacking can give better results. Model ensembles are now commonly used.

10th Lesson: Simplicity Does Not Imply Accuracy
Rather, simpler hypotheses are preferred for the sake of simplicity rather than accuracy.

11th Lesson: Representable Does Not Imply Learnable
Rather, the form, depth, breadth, and distribution of the data may make the function difficult, if not impossible, to learn. The relationship between representation and function can vary.

12th Lesson: Correlation Does Not Imply Causation
Rather, correlation *may* indicate that there is a set of causal links. This relationship should be explored experimentally to discover the link before a conclusion is made about causality.

This article is a great introduction to the pitfalls of machine learning. However, it does seem that some of the discussion is slightly too abstract and pedagogical to be operationally useful when the discussed problems are encountered in the field. However, as a primer, the article works well.

Pedro Domingos. 2012. A few useful things to know about machine learning. Commun. ACM 55, 10 (October 2012), 78-87. DOI=10.1145/2347736.2347755 http://doi.acm.org/10.1145/2347736.2347755

Saturday, March 2, 2013

Sketch Based Interfaces: Early Processing for Sketch Understanding

In 2001, Sezgin et al. published a paper that re-introduced the idea of finding corners based on speed to the sketch recognition community. Their paper and associated system was aimed at allowing for natural free form sketches that can then be beautified and recognized to convey the same information at was intended when drawn. This smarter non-restrictive sketching was intended to allow for the freedom of freehand sketching while preserving the detail, correctness, and direct manipulation capability usually provided by more rigorous design tools, thus bridging a gap in sketch based software and professional design software and thus ease the creation process for designers.

Their system works in 3 steps:

Approximation - to find the vertices

Vertex Detection

Uses Speed and direction of the stroke to recognize corners (vertices)

Maximum change of direction
Minimum speed

Reduces noise with average based filtering

Chooses points above an emperically based threshold of the average value for change of direction
Chooses points below 90% of the average value for speed

Use Hybrid Fit Generation to decide which vertices are actual corners

Uses exactly matching vertices chosen from the change of direction set (Fd) and the speed set (Fs) to generate an initial "hybrid fit" H_0
adds vertices one at a time to create two new fits where each fit is the previous fit plus one vertex from Fd and Fs
to produce H'' = H_i + V_d and H' = H_i + V_s , respectively
calculates the error produced by the new fits and checks if the error is still within an acceptable threshold.

Error is calculated as the orthogonal distance squared from the fit from the fit to all points and divides by the number of the points (which produces the average sum of square of the distances from the fit to each point in the stroke)

The final fit is H_f

Handling the curves

Handles Curves

Finds the curved regions
if( L_2 / L_1 > 1 + threshold ) { then it's a curve }

Where L_2 is the arc length between two consecutive points in the input stroke
Where L_1 is the Euclidean distance between the same two points in H_f

Approximates the curved regions through the use of Bezier curves (2 endpoints, 2 control points)

Where C_1 = k * t_1 + S_i
Where C_2 = k * t_2 + S_j

Where S_i and S_j are points on the input stroke

where i < j

Where t_1 and t_2 are unit length tangent vectors pointing inward to the curve
Where k is the control to for scaling t_1 and t_2 to reach the control points

k = 1/3 sum from n = i to j -1 ( abs ( S_n - S_n+1 ) )

Beautification

realigns the lines between vertices to have the average slope of the slopes between points along the line and rotates those lines around its midpoint

Basic Recognition

Uses "hand-tailored" templates to recognize geometric objects constructed from the lines and the curves left after beautification.

Ovals, Circles, Squares, Rectangles

The evaluation of their system consisted a qualitative comparative study against Xfig to determine relative ease of use and efficiency and a study on recognition rate for vertices and curves.
While claiming "praise" from their 13 participants and 96% recognition rate for identifying vertices and curves on 10 different figures, little was said on how well it was able to classify the higher level geometric shapes.

There are many problems with the paper including the use of arbitrary threshold such as the thresholds for choosing vertices in Fd and Fs and the neighborhood range value k which normally would make this paper unacceptable. But given its positive results on making sketching more accurate and seamless, this paper was a major contribution for the field.

Tevfik Metin Sezgin, Thomas Stahovich, and Randall Davis. 2006. Sketch based interfaces: early processing for sketch understanding. In ACM SIGGRAPH 2006 Courses (SIGGRAPH '06). ACM, New York, NY, USA, , Article 22 . DOI=10.1145/1185657.1185783 http://doi.acm.org/10.1145/1185657.1185783