Sketch Recognition: January 2013

Friday, January 25, 2013

$1 Recognizer

In 2007, Wobbrock et al. released a paper called, "Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes," that has since revolutionized the way research is presented and algorithms are thought of. Traditionally, research and tools aimed to solve any given problem in computer science focuses on the effectiveness or efficiency of the solution. As such, the solutions tend to get progressively more complicated, involved, or sophisticated. While this growth in complexity is typical and should somewhat be expected, often times the people that could most benefit from these solutions are forgotten. Wobbrock et al's noticed this trend and attempted to address the need for a simple, understandable, cheap, solution for their target problem. They understood that the average programmer and interface specialist does not typically have access to the more advanced recognition techniques, such as Hidden Markov Models, neural networks, and feature-based statistical recognizers (Rubine). To address the needs of the community, Wobbrock et al sought to create a easy to implement recognizer that performed reasonably well. The result was their $1 recognizer.

Today, the $1 recognizer is an extremely well known algorithm even outside sketch recognition community simply for the fact that it has been readily adopted by the community and widely deployed. Their success highlights the value in understanding the beneficiaries of your work and making your solution both effective and simple.

Unlike the (previously discussed) Rubine algorithm, the $1 algorithm does not work by using a classifier. Rather, the $1 algorithm performs recognition on the fly and compares given candidate gesture to a template set, returns a distance value, and returns an list of matches ordered from most similar to least.

Before the $1 recognizer can calculate the distance, it first performs a number of steps aiming at orienting and positioning the candidate gesture into the same 'frame-of-reference' to the template gestures in a pre-processing phase. The steps that the recognizer takes in the pre-processing phase has the effect of making the comparison time, rotation, scale, and position invariant.

Pre-processing steps:

Resample the point path to N points

New points are equidistant along the path

Effect: Temporal Invariance

Rotate to set indicative angle to 0°

Indicative angle is the angle between the centroid (xAvg, yAvg) and the first point (x0,y0)

Effect: Rotational Invariance

Scale and Translate

Scale gesture (non-uniformly) to a reference square

Effect: Scale Invariance

Translate gesture so that centroid is at position (0,0)

Effect: Position Invariance

Matching Step:

4. Find the average distance between the given candidate gesture and the templates (Ti).

AvgDistance =
( sum from k=1 to N ( sqrt( ( C[k]x - Ti[k]x ) ^ 2 + ( C[k]y - Ti[k]y ) ^ 2 ) ) ) / N

After getting the average distances, we take the minimum, di*, and convert to a score in the range [0..1]

Score = 1 - ( di* / ( ( 1/2 ) * sqrt( size^2 + size^2 ) ) )

Since the indicative angle is an approximation of the best angle to use, this step can optionally be followed up with finding the distance at the best angle. The fastest technique used to find this best angle (with a small amount of error) that was discussed was the Golden Section Search (GSS).

Wobbrock et al's analyzed the $1 recognizer alongside the Rubine classifier and the Dynamic Time Warping (DTW) recognizer and found that for...

The Effect of Training of Recognition Errors

$1 performed comparably to DTW
$1 performed better than Rubine, especially for 3 training examples or less.

The Effect of Speed on Recognition Errors

$1 performed comparably to DTW
$1 performed better than Rubine

Recognition Scores in N-best List [0..1]

$1 discriminated between other gesture classes slightly more sharply than DTW
$1 discriminated between other gesture classes considerably more sharply than Rubine

Speed of Recognition

$1 overwhelmingly outperformed DTW

0.04 mins to 60.02 mins on 14,400 tests

$1 outperformed Rubine

0.04 mins to 0.60 mins on 14,400 tests

$1 drawbacks:

It's a geometric template matcher so result is the closest match in 2D space.
It can't distinguish gestures who identities depend on

specific orientations
aspect ratios
locations

Horizontal and Vertical lines are deformed by non-uniform scaling

The $1 recognizer represents a great and simple solution for making basic sketch recognition on one stroke gestures simple.

Cited work: Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes, Jacob Wobbrock and Yang Li, The Information School, University of Washington, Seattle, Washington, Andew D Wilson, Microsoft Research, Redmond, Washington, UIST 2007, Proceedings of the 20th annual ACM symposium on User Interface Software and Technology, Pages 159-168, ACM New York, NY, USA 2007, http://dl.acm.org/citation.cfm?id=1294238

Thursday, January 24, 2013

Rubine Algorithm - Specifying Gestures By Example

Written in 1991, Dean Rubine's paper called, "Specifying Gestures By Example" is one of the earliest and most widely used algorithms for single stroke sketch recognition today.

The Rubine recognizer is essentially a weighted linear discriminator that classifies given single stroke sketches based on a training set. This means that the Rubine recognizer 'learns' what the shapes look like through features much like how humans visually perceive the shapes. This is opposed to other recognizers at the time that hard coded the properties of the shapes that were supposed to be recognized. Another major strength of Rubine's algorithm comes from the simple set of features that have a few interesting properties:

Each feature is incrementally computable in constant time per input point.

Allows large gestures to be handled as efficiently as small ones

A small change in the input results in a small change in the feature value

Classification performs better since feature is not multi-modal

Each gesture is 'meaningful'

Feature can be used in gesture semantics as well as for recognition

Features:

Initial direction (angle) of gesture

f1 = cos(alpha) = (x2 - x0) / sqrt( (x2 - x0)^2 + (y2 - y0)^2) )
f2 = sin(alpha) = (y2 - y0) / sqrt( (x2 - x0)^2 + (y2 - y0)^2) )

Using both sin and cos helps avoid discontinuities as the angle passes through 2pi and 0

Length of the diagonal of the bounding box (size of gesture)

f3 = sqrt( (Xmax - Xmin)^2 + (Ymax - Ymin)^2 )

Angle of the bounding box

f4 = arctan( (Ymax - Ymin) / (Xmax - Xmin) )

Distance between first and last points

f5 = sqrt( (Xn-1 - X0)^2 + (Yn-1 - Y0)^2 )

The angle between the first and last points

f6 = cos(beta) = (Xn-1 - X0) / f5
f7 = sin(beta) = (Yn-1 - Y0) / f5

Total gesture length

f8 = sum from i = 0 to i = P-1 ( sqrt( deltaXi ^2 + deltaYi^2 ) )

Where deltaXi = Xi+1 - Xi and deltaYi = Yi+1 - Yi

Total angle traversed

f9 = sum from i = 1 to i = P-2 ( THETAi )

Where THETAi = arctan( (deltaXi * deltaYi-1 - deltaY * deltaXi-1) /
( deltaXi * delta Xi-1 + deltaYi * deltaYi-1 ) )

Total absolute values of the angles between each point

f10 = sum from i = 1 to i = P-2 ( abs(THETAi) )

Total squared values of the angles between each point (Sharpness)

f11 = sum from i = 1 to i = P-2 ( THETAi ^ 2 )

Rubine also discusses a less well remembered system used to implement gestural interfaces on top of existing drag-and-drop based programs called, "GRANDMA" which stands for Gesture Recognizers Automated in a Novel Direct Manipulation Architecture. Primarily, GRANDMA was aimed (lol) at making it easy to use sketch recognition since most recognition features were being custom built for every new program. Other than its (rather funny) name, I don't think GRANDMA is particularly memorable, but maybe that's coming from my perspective as a student in the year 2013 where gestural interfaces are fairly common on popular mobile devices.

Cited work:
Specifying Gestures by Example, Dean Rubine, Information Technology Center, Carnegie Mellon University, Pittsburgh, PA, SIGGRAPH 91, Proceedings of the 18th annual conference on Computer graphics and interactive techniques, Pages 329-337, ACM New York, NY, USA 1991, http://dl.acm.org/citation.cfm?id=122753

Monday, January 21, 2013

Sketchpad : A Man-Machine Graphical Communication System

Written in 1963, Sutherland's Sketchpad system and resulting paper revolutionized the field of computer science. Sutherland's work was particularly unique as a new method of computer-human interaction where input was taken in through the newly invented light pen to construct drawings on an special interactive monitor screen. At the time, anything other than textual keyboard input was unheard of. In fact, Sutherland's Sketchpad introduced a completely new idea into the computer science community: Graphical User Interfaces. (You know, just as a side product.)

In addition to simply drawing shapes, Sketchpad enabled the user to establish and manipulate constraints for the lines and objects that were drawn, save those shapes and associated constraints to a library, reuse them through copying, and even merge those shapes and properties to create new classes of objects. In creating a generalized and abstract system for defining objects, Sutherland in-effect created* the modern concept of object oriented programming! Even more impressive is the fact that Sutherland had no language support for defining objects or classes or inheritance or anything like that. By 1963, C, not to mention C++ (C with classes), hadn't even been invented yet. "High level" programming languages like Fortran, LISP, and Algol were relatively new. In his paper, Sutherland clearly indicates the low level nature of the workings of his program when he says things like, "...operations are implemented by short sections of program defined as MACRO instructions in the compiler language..." and "...As parts of the drawing are deleted, the registers which were used to represent them are free." Sutherland also stresses the important of now commonly known benefits of the object-oriented programming paradigm in a section of the paper named, "Generic structure, hierarchies," when he said that, "Without a generic structure it would be almost impossible to add the instructions required to handle a new type of element."

Sketchpad in use

Sutherland using the TX-2
running Sketchpad

Sutherland also touched on gestures for his pen based interface when he mentions that the drawing ends by "flicking" the pen faster than the system can track it. The paper even touches on aspects of graphics and visualization when he talks about line and circle generation, magnification of pictures, and translation of figures.

Sutherland also discussed two very different forms of now common computer usage giving examples of how his Sketchpad system was being used.

Computer Aided Design (CAD)

Since his drawing system could reproduce shapes and create and manipulate systems of constraints, Sutherland used his system as a way of designing bridges within a set of resource and physical constraints.

Bridge made in Sketchpad

2. Artistic Drawings

While our modern society may take creating art through computers for granted in the forms of digital image creation with photoshop, video games through development engines and kits such as Unity3d, or even music through programs like FruityLoops, creating art through computers was almost unthinkable in 1963. Despite this, Sutherland discusses creating animated images.

Artistic drawing "Nefertite"
and component parts
made in Sketchpad

To top it all off, Sutherland, in discussing future work, casually mentions future applications like direct 3D object creation/manipulation, 2D to 3D object transformations, and some potential work in the yet to be created field computer vision.

For his work with SketchPad, Sutherland has received the Turning Award in 1988 and the Kyoto Award in 2012.

*Ideas similar to object-oriented programming had been around MIT in the field of artificial intelligence, particularly around LISP. Later in 1967, Simula 67 introduced objects as a programming concept.

Cited work:
Sketchpad a man-machine graphical communication system, Ivan E. Sutherland, Consultant, Lincoln Laboratory, Massachusetts Institute of Technology, DAC '64 Proceedings of the SHARE design automation workshop, Pages 6.329 - 6.346, ACM New York, NY, USA ©1964

Thursday, January 17, 2013

(More) About Kanji Storyteller

Compose View

Explore View

I've been developing an educational sketch recognition based program called Kanji Storyteller that acts as a learning space for students of foreign languages. Currently, it only supports Japanese and only recognizes a small set of kanji. When I first started working on it for a Computers and New Media course project, I was working with a group of students from Dr. Hammond's Sketch Recognition Lab and I was pretty dependent on their contributions for the recognizer. I want to improve the recognizer and free myself from dependence on others for future development.

Recognizer Update

We ended up using a template based recognizer that checks sketches against a training set on the fly. It can recognize the symbols but it gets too slow with increasing size of the training set and it tends to be a little too temperamental and unpredictable. As such, I will be converting the recognizer to a gesture based recognition scheme.

Gesture/Stroke Based Recognition

Recognizer Expansion

For Japanese, I'd like to expand the recognizer to recognize:

平仮名

hiragana - phonetic syllabary

片仮名

katakana - phonetic syllabary for words borrowed form other languages

ローマ字

romaji - Roman alphabet characters

部首

bushu - "radicals" - Parts of Kanji

Stroke Order and Direction Based Recognition - rejects or notifies when a kanji written in the wrong order or direction
More Kanji and Jukugo

Social Connection and Collaborative Learning

One of the major features that I wanted to include was support for networked connections where other students, mentors, instructors, and even spectators could join a session (class, group) and share their ideas and thoughts about the kanji. The shared interaction space would allow the students and other participants to share mnemonics, demonstrate correct stroke order, and collaboratively create scenes that would help reinforce learning. Some classes of participants could be restricted to only observe and comment while other classes of participants could add to the scene annotation and sketch as well as comment. Providing audio support for open dialogue would be extremely beneficial.

Host a session (teacher only?)
(Sockets?) connect to a session
Support Audio Chat
Group editing of scene annotation
Canvas updates across multiple connected sessions in real time
Networked saving and storage of canvas files (xml)
Suggest beneficial group behavior

mnemonics
memorable stories about kanji
discuss aspects of kanji
make fun/cool stories in the annotations
group games (competitive and collaborative)

Gamified Learning Suite

Going beyond character recognition, I would like the application to provide a language learning suite that featured grammar lessons, full sentences, vocabulary drills, and learning games and quizzes. Kanji Storyteller should also keep track of the student's progress in the form of the kanji and concepts that they have encountered and mastered. Making this generic and modular enough to add new languages would be ideal, although there are Japanese specific concepts that would need to be emphasized. (Better to do one thing well than many things poorly.)

Grammar Lessons
Example Sentences
Vocabulary Drills, Learning Games, Quizzes
Progress Tracking

Kanji Encountered
Kanji "Learned"
Grammar Concepts Mastered
Badges / Achievements
Learning Metrics

Average Difficulty of Kanji Encountered (Breadth of Knowledge)

Grade Level
JLPT

Average Difficulty of Kanji Mastered
Average Time to Recall and Write Kanji
Average Time to Complete Drill of N Kanji
% recall of Kanji characteristics (Depth of Knowledge)

Onyomi, Kunyomi, English Meaning, Jukugo

User Profiles Stored on Database

User Profile Image

Dynamic Sticker Image Lookup and Related Kanji Image Creation

To provide some variety in the kind of images that can be used for the stickers, I'd like to move away from using a static image library/database and lookup images on the internet dynamically at runtime. This would also reduce the amount of work it takes to add new kanji to the program where only the training set, classifier, and audio would have to be updated. The blue kanji identifier tag on the bottom right of the image would have to be added on the fly too though.

Dynamic Sticker Image Lookup
Dynamically Create Blue Kanji Identifier Tag
Dynamically Create Basic Kanji Image
Dynamically Create Basic Jukugo Image

Print Function

Students and Teachers may want to have a paper printouts of the information for their own records or creations.

Dictionary Panel
Canvas
Comments Panel (or subset of it)
Story Panel (or subset of it)

Miscellaneous

Change the layer control button icons to images rather than letters
Get rid of the 'flashing' effect when things are moved/updated in GUI
Search Function to better search English to Japanese
Translate Multiple Stickers at once
Move stickers relative to background if size is changed
Get rid of 0 value for Frequency of Use column in Database
Smarter/Better GUI sizing
Have quick dictionary lookup by using kanji that is already on the canvas

Other Languages
At some point, I may also like to expand the whole application to support other languages (but I'm not completely sold on the idea):

English

Dictionary Panel
Roman Alphabet
Words

Korean

Hangul

Detecting Syllabic Groupings
Detecting Symbols within Syllabic Groupings
Detecting Valid Combinations and Arrangements of Letters Within Groupings

Chinese (Mandarin)

Dictionary Panel
Simplified Kanji Set

Update and Generalize Menu and Change Menu Options Based on Language

Dictionary Panel - Sortable Search

Dictionary Panel - Kanji Information Card

Tuesday, January 15, 2013

Self Introduction

Name: Ross Peterson
Email:

University: rosscpeterson@neo.tamu.edu
Gmail: rosspeterson06@gmail.com

Major: Computer Science
Year: 2nd Year Masters (Non-Thesis), last semester
Undergraduate Education: Texas A&M University, College Station

Why I'm Taking the Class:

Dr. Hammond is a great professor. She covers a wide range of material, knows the literature, is active and respected in the fields she teaches, and runs a fun class. I should know since this is the 4th class I've taken with her.
I've been developing an educational sketch recognition based program called Kanji Storyteller that acts as a learning space for students of foreign languages. Currently, it only supports Japanese and only recognizes a small set of kanji. When I first started working on it, I was working with a group of students from Dr. Hammond's Sketch Recognition Lab and I was pretty dependent on their contributions for the recognizer. I want to improve the recognizer and free myself from dependence on others for future development. (more about that project later.)

Experience / Expertise I bring to the class:
I won't claims expertise in any particular area but my skills include:

Programming Languages: C++/C, Python, Java, Javascript, C#, PHP, ,NET, WPF, XML, XAML, Haskell, Lisp, Prolog, MySQL

Natural Languages: English, Spanish, Japanese

Other Skills/Experiences:

Multithreaded Programming, Windows and Unix System Programming (GUI, inter-process comm., threads, sockets), Agile Methods: Scrum, Game Development in Unity3d

Professional Life Goals:

Make a memorable game, preferably fun.
Develop a top-tier educational tool for learning languages.
Find work in the video game industry for a time.

Personal Life Goals:

To serve God and others well
To love God and others fully
To live wisely
To learn many languages well and visit and experience many cultures

What I want to do after I graduate:

Get a good job (pays reasonably with fun coworkers and fulfilling work) and earn enough to have a great wedding and honeymoon in October

What I expect to be doing in 10 years:

One can never be sure, but hopefully learning a new language and living out my personal and professional goals.

What do you think will be the next biggest technological advancement in computer science?

Proliferation of nano-machines for sensing, communication, and environment alteration (maybe that's the next after next after next advancement)

If you could travel back in time, who would you like to meet and why?

C.S. Lewis - The man is simply brilliant. He writes very succinctly and has a very discerning mind. He sees what everyone sees in any particular matter and then drills past assumptions and hits what is really the heart of the matter. His logical approach to theology results in balanced and sound stances on anything he approaches.

Describe your favorite shoes and why they are your favorite?

I suppose that I enjoyed a pair of leather boots with a belt style buckle that looks similar to harness boots. They had a rounded toe look. They were heavy and you felt like you could command a presence without being forcibly stylish or vain.

If you could be fluent in any foreign language that you're not already fluent in, which one would it be and why?

Either Mandarin or German, I think. I like their history, art, food, culture, and way their languages sound.

Something Funny / Interesting:

I had no idea what programming was when I started studying computer science.