Friday, January 25, 2013

$1 Recognizer


In 2007, Wobbrock et al. released a paper called, "Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes," that has since revolutionized the way research is presented and algorithms are thought of. Traditionally, research and tools aimed to solve any given problem in computer science focuses on the effectiveness or efficiency of the solution. As such, the solutions tend to get progressively more complicated, involved, or sophisticated. While this growth in complexity is typical and should somewhat be expected, often times the people that could most benefit from these solutions are forgotten. Wobbrock et al's noticed this trend and attempted to address the need for a simple, understandable, cheap, solution for their target problem. They understood that the average programmer and interface specialist does not typically have access to the more advanced recognition techniques, such as Hidden Markov Models, neural networks, and feature-based statistical recognizers (Rubine). To address the needs of the community, Wobbrock et al sought to create a easy to implement recognizer that performed reasonably well. The result was their $1 recognizer.

Today, the $1 recognizer is an extremely well known algorithm even outside sketch recognition community simply for the fact that it has been readily adopted by the community and widely deployed. Their success highlights the value in understanding the beneficiaries of your work and making your solution both effective and simple.

Unlike the (previously discussed) Rubine algorithm, the $1 algorithm does not work by using a classifier. Rather, the $1 algorithm performs recognition on the fly and compares given candidate gesture to a template set, returns a distance value, and returns an list of matches ordered from most similar to least.

Before the $1 recognizer can calculate the distance, it first performs a number of steps aiming at orienting and positioning the candidate gesture into the same 'frame-of-reference' to the template gestures in a pre-processing phase. The steps that the recognizer takes in the pre-processing phase has the effect of making the comparison time, rotation, scale, and position invariant.

Pre-processing steps:

  1. Resample the point path to N points
    • New points are equidistant along the path
      • Effect: Temporal Invariance
  2. Rotate to set indicative angle to 0°
    • Indicative angle is the angle between the centroid (xAvg, yAvg) and the first point (x0,y0)
      • Effect: Rotational Invariance
  3. Scale and Translate
    • Scale gesture (non-uniformly) to a reference square
      • Effect: Scale Invariance
    • Translate gesture so that centroid is at position (0,0)
      • Effect: Position Invariance
Matching Step:

4.  Find the average distance between the given candidate gesture and the templates (Ti).
    • AvgDistance =
      ( sum from k=1 to N ( sqrt( ( C[k]x - Ti[k]x ) ^ 2 + ( C[k]y - Ti[k]y ) ^ 2 ) ) ) / N
 After getting the average distances, we take the minimum, di*, and convert to a score in the range [0..1]
    • Score = 1  - ( di*  / ( ( 1/2 ) * sqrt( size^2 + size^2 ) ) )
Since the indicative angle is an approximation of the best angle to use, this step can optionally be followed up with finding the distance at the best angle. The fastest technique used to find this best angle (with a small amount of error) that was discussed was the Golden Section Search (GSS).

Wobbrock et al's analyzed the $1 recognizer alongside the Rubine classifier and the Dynamic Time Warping (DTW) recognizer and found that for...
  • The Effect of Training of Recognition Errors
    • $1 performed comparably to DTW
    • $1 performed better than Rubine, especially for 3 training examples or less.
  • The Effect of Speed on Recognition Errors
    • $1 performed comparably to DTW
    • $1 performed better than Rubine
  • Recognition Scores in N-best List [0..1]
    • $1 discriminated between other gesture classes slightly more sharply than DTW
    • $1 discriminated between other gesture classes considerably more sharply than Rubine
  • Speed of Recognition
    • $1 overwhelmingly outperformed DTW 
      • 0.04 mins to 60.02 mins on 14,400 tests
    • $1 outperformed Rubine
      • 0.04 mins to 0.60 mins on 14,400 tests

$1 drawbacks:

  • It's a geometric template matcher so result is the closest match in 2D space.
  • It can't distinguish gestures who identities depend on 
    • specific orientations
    • aspect ratios
    • locations
  • Horizontal and Vertical lines are deformed by non-uniform scaling


The $1 recognizer represents a great and simple solution for making basic sketch recognition on one stroke gestures simple.

Cited work: Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface PrototypesJacob Wobbrock and Yang Li, The Information School, University of Washington, Seattle, Washington, Andew D Wilson, Microsoft Research, Redmond, Washington, UIST 2007, Proceedings of the 20th annual ACM symposium on User Interface Software and Technology, Pages 159-168, ACM New York, NY, USA 2007, http://dl.acm.org/citation.cfm?id=1294238

Thursday, January 24, 2013

Rubine Algorithm - Specifying Gestures By Example


Written in 1991, Dean Rubine's paper called, "Specifying Gestures By Example" is one of the earliest and most widely used algorithms for single stroke sketch recognition today.

The Rubine recognizer is essentially a weighted linear discriminator that classifies given single stroke sketches based on a training set. This means that the Rubine recognizer 'learns' what the shapes look like through features much like how humans visually perceive the shapes. This is opposed to other recognizers at the time that hard coded the properties of the shapes that were supposed to be recognized. Another major strength of Rubine's algorithm comes from the simple set of features that have a few interesting properties:

  • Each feature is incrementally computable in constant time per input point.
    • Allows large gestures to be handled as efficiently as small ones
  • A small change in the input results in a small change in the feature value
    • Classification performs better since feature is not multi-modal
  • Each gesture is 'meaningful'
    • Feature can be used in gesture semantics as well as for recognition

Features:
  • Initial direction (angle) of gesture
    • f1 = cos(alpha) = (x2 - x0)  / sqrt( (x2 - x0)^2 + (y2 - y0)^2) )
    • f2 = sin(alpha) = (y2 - y0)  / sqrt( (x2 - x0)^2 + (y2 - y0)^2) )
      • Using both sin and cos helps avoid discontinuities as the angle passes through 2pi and 0
  • Length of the diagonal of the bounding box (size of gesture)
    • f3 = sqrt( (Xmax - Xmin)^2 + (Ymax - Ymin)^2 )
  • Angle of the bounding box
    • f4 = arctan( (Ymax - Ymin) / (Xmax - Xmin) )
  • Distance between first and last points
    • f5 = sqrt( (Xn-1 - X0)^2 + (Yn-1 - Y0)^2 )
  • The angle between the first and last points
    • f6 = cos(beta) = (Xn-1 - X0) / f5
    • f7 = sin(beta) = (Yn-1 - Y0) / f5
  • Total gesture length
    • f8 = sum from i = 0 to i = P-1 ( sqrt( deltaXi ^2 + deltaYi^2 ) )
      • Where deltaXi = Xi+1 - Xi  and deltaYi = Yi+1 - Yi
  • Total angle traversed
    • f9 = sum from i = 1 to i = P-2 ( THETAi )
      • Where THETAi = arctan( (deltaXi *  deltaYi-1  - deltaY * deltaXi-1) /
                                                ( deltaXi * delta Xi-1 + deltaYi * deltaYi-1 ) )
  • Total absolute values of the angles between each point
    • f10 = sum from i = 1 to i = P-2 ( abs(THETAi) )
  • Total squared values of the angles between each point (Sharpness)
    • f11 = sum from i = 1 to i = P-2 ( THETAi ^ 2 ) 

Rubine also discusses a less well remembered system used to implement gestural interfaces on top of existing drag-and-drop based programs called, "GRANDMA" which stands for Gesture Recognizers Automated in a Novel Direct Manipulation Architecture. Primarily, GRANDMA was aimed (lol) at making it easy to use sketch recognition since most recognition features were being custom built for every new program. Other than its (rather funny) name, I don't think GRANDMA is particularly memorable, but maybe that's coming from my perspective as a student in the year 2013 where gestural interfaces are fairly common on popular mobile devices.

Cited work:
Specifying Gestures by Example, Dean Rubine, Information Technology Center, Carnegie Mellon University, Pittsburgh, PA, SIGGRAPH 91, Proceedings of the 18th annual conference on Computer graphics and interactive techniques, Pages 329-337, ACM New York, NY, USA 1991, http://dl.acm.org/citation.cfm?id=122753

Monday, January 21, 2013

Sketchpad : A Man-Machine Graphical Communication System

Written in 1963, Sutherland's Sketchpad system and resulting paper revolutionized the field of computer science. Sutherland's work was particularly unique as a new method of computer-human interaction where input was taken in through the newly invented light pen to construct drawings on an special interactive monitor screen. At the time, anything other than textual keyboard input was unheard of. In fact, Sutherland's Sketchpad introduced a completely new idea into the computer science community: Graphical User Interfaces. (You know, just as a side product.)

In addition to simply drawing shapes, Sketchpad enabled the user to establish and manipulate constraints for the lines and objects that were drawn, save those shapes and associated constraints to a library, reuse them through copying, and even merge those shapes and properties to create new classes of objects. In creating a generalized and abstract system for defining objects, Sutherland in-effect created* the modern concept of object oriented programming! Even more impressive is the fact that Sutherland had no language support for defining objects or classes or inheritance or anything like that. By 1963, C, not to mention C++ (C with classes), hadn't even been invented yet. "High level" programming languages like Fortran, LISP, and Algol were relatively new. In his paper, Sutherland clearly indicates the low level nature of the workings of his program when he says things like, "...operations are implemented by short sections of program defined as MACRO instructions in the compiler language..." and "...As parts of the drawing are deleted, the registers which were used to represent them are free." Sutherland also stresses the important of now commonly known benefits of the object-oriented programming paradigm in a section of the paper named, "Generic structure, hierarchies," when he said that, "Without a generic structure it would be almost impossible to add the instructions required to handle a new type of element."
Sketchpad in use
Sutherland using the TX-2
running Sketchpad













Sutherland also touched on gestures for his pen based interface when he mentions that the drawing ends by "flicking" the pen faster than the system can track it. The paper even touches on aspects of graphics and visualization when he talks about line and circle generation, magnification of pictures, and translation of figures.

Sutherland also discussed two very different forms of now common computer usage giving examples of how his Sketchpad system was being used.
  1. Computer Aided Design (CAD)
    • Since his drawing system could reproduce shapes and create and manipulate systems of constraints, Sutherland used his system as a way of designing bridges within a set of resource and physical constraints.
Bridge made in Sketchpad
2. Artistic Drawings
    • While our modern society may take creating art through computers for granted in the forms of digital image creation with photoshop, video games through development engines and kits such as Unity3d, or even music through programs like FruityLoops, creating art through computers was almost unthinkable in 1963. Despite this, Sutherland discusses creating animated images.
Artistic drawing "Nefertite"
 and component parts
made in Sketchpad

To top it all off, Sutherland, in discussing future work, casually mentions future applications like direct 3D object creation/manipulation, 2D to 3D object transformations, and some potential work in the yet to be created field computer vision.

For his work with SketchPad, Sutherland has received the Turning Award in 1988 and the Kyoto Award in 2012.

*Ideas similar to object-oriented programming had been around MIT in the field of artificial intelligence, particularly around LISP. Later in 1967, Simula 67 introduced objects as a programming concept.

Cited work:
Sketchpad a man-machine graphical communication system, Ivan E. Sutherland, Consultant, Lincoln Laboratory, Massachusetts Institute of Technology, DAC '64 Proceedings of the SHARE design automation workshop, Pages 6.329 - 6.346, ACM New York, NY, USA ©1964

Thursday, January 17, 2013

(More) About Kanji Storyteller



Compose View
Explore View
I've been developing an educational sketch recognition based program called Kanji Storyteller that acts as a learning space for students of foreign languages. Currently, it only supports Japanese and only recognizes a small set of kanji. When I first started working on it for a Computers and New Media course project, I was working with a group of students from Dr. Hammond's Sketch Recognition Lab and I was pretty dependent on their contributions for the recognizer. I want to improve the recognizer and free myself from dependence on others for future development.

Recognizer Update
We ended up using a template based recognizer that checks sketches against a training set on the fly. It can recognize the symbols but it gets too slow with increasing size of the training set and it tends to be a little too temperamental and unpredictable. As such, I will be converting the recognizer to a gesture based recognition scheme.

  • Gesture/Stroke Based Recognition

Recognizer Expansion
For Japanese, I'd like to expand the recognizer to recognize:
  • 平仮名
    • hiragana - phonetic syllabary
  • 片仮名
    • katakana - phonetic syllabary for words borrowed form other languages
  • ローマ字
    • romaji - Roman alphabet characters
  • 部首
    • bushu - "radicals" - Parts of Kanji
  • Stroke Order and Direction Based Recognition - rejects or notifies when a kanji written in the wrong order or direction
  • More Kanji and Jukugo
Social Connection and Collaborative Learning
One of the major features that I wanted to include was support for networked connections where other students, mentors, instructors, and even spectators could join a session (class, group) and share their ideas and thoughts about the kanji. The shared interaction space would allow the students and other participants to share mnemonics, demonstrate correct stroke order, and collaboratively create scenes that would help reinforce learning. Some classes of participants could be restricted to only observe and comment while other classes of participants could add to the scene annotation and sketch as well as comment. Providing audio support for open dialogue would be extremely beneficial.
  • Host a session (teacher only?)
  • (Sockets?) connect to a session
  • Support Audio Chat
  • Group editing of scene annotation
  • Canvas updates across multiple connected sessions in real time
  • Networked saving and storage of canvas files (xml)
  • Suggest beneficial group behavior
    • mnemonics
    • memorable stories about kanji
    • discuss aspects of kanji
    • make fun/cool stories in the annotations
    • group games (competitive and collaborative)
Gamified Learning Suite
Going beyond character recognition, I would like the application to provide a language learning suite that featured grammar lessons, full sentences, vocabulary drills, and learning games and quizzes. Kanji Storyteller should also keep track of the student's progress in the form of the kanji and concepts that they have encountered and mastered. Making this generic and modular enough to add new languages would be ideal, although there are Japanese specific concepts that would need to be emphasized. (Better to do one thing well than many things poorly.)
  • Grammar Lessons
  • Example Sentences
  • Vocabulary Drills, Learning Games, Quizzes
  • Progress Tracking
    • Kanji Encountered
    • Kanji "Learned"
    • Grammar Concepts Mastered
    • Badges / Achievements
    • Learning Metrics
      • Average Difficulty of Kanji Encountered (Breadth of Knowledge)
        • Grade Level
        • JLPT
      • Average Difficulty of Kanji Mastered
      • Average Time to Recall and Write Kanji
      • Average Time to Complete Drill of N Kanji
      • % recall of Kanji characteristics (Depth of Knowledge)
        • Onyomi, Kunyomi, English Meaning, Jukugo
  • User Profiles Stored on Database
    • User Profile Image
Dynamic Sticker Image Lookup and Related Kanji Image Creation
To provide some variety in the kind of images that can be used for the stickers, I'd like to move away from using a static image library/database and lookup images on the internet dynamically at runtime. This would also reduce the amount of work it takes to add new kanji to the program where only the training set, classifier, and audio would have to be updated. The blue kanji identifier tag on the bottom right of the image would have to be added on the fly too though.
  • Dynamic Sticker Image Lookup
  • Dynamically Create Blue Kanji Identifier Tag
  • Dynamically Create Basic Kanji Image
  • Dynamically Create Basic Jukugo Image
Print Function 
Students and Teachers may want to have a paper printouts of the information for their own records or creations. 
  • Dictionary Panel
  • Canvas
  • Comments Panel (or subset of it)
  • Story Panel (or subset of it)
Miscellaneous
  • Change the layer control button icons to images rather than letters
  • Get rid of the 'flashing' effect when things are moved/updated in GUI
  • Search Function to better search English to Japanese
  • Translate Multiple Stickers at once
  • Move stickers relative to background if size is changed
  • Get rid of 0 value for Frequency of Use column in Database
  • Smarter/Better GUI sizing
  • Have quick dictionary lookup by using kanji that is already on the canvas
Other Languages
At some point, I may also like to expand the whole application to support other languages (but I'm not completely sold on the idea):
  • English
    • Dictionary Panel
    • Roman Alphabet
    • Words
  • Korean
    • Hangul
      • Detecting Syllabic Groupings
      • Detecting Symbols within Syllabic Groupings
      • Detecting Valid Combinations and Arrangements of Letters Within Groupings
  • Chinese (Mandarin)
    • Dictionary Panel
    • Simplified Kanji Set
  • Update and Generalize Menu and Change Menu Options Based on Language
Dictionary Panel - Sortable Search

Dictionary Panel - Kanji Information Card

Tuesday, January 15, 2013

Self Introduction

Name: Ross Peterson
Email:
            • University: rosscpeterson@neo.tamu.edu
            • Gmail: rosspeterson06@gmail.com
Major: Computer Science
Year: 2nd Year Masters (Non-Thesis), last semester
Undergraduate Education: Texas A&M University, College Station



Why I'm Taking the Class: 
  1. Dr. Hammond is a great professor. She covers a wide range of material, knows the literature, is active and respected in the fields she teaches, and runs a fun class. I should know since this is the 4th class I've taken with her.
  2. I've been developing an educational sketch recognition based program called Kanji Storyteller that acts as a learning space for students of foreign languages. Currently, it only supports Japanese and only recognizes a small set of kanji. When I first started working on it, I was working with a group of students from Dr. Hammond's Sketch Recognition Lab and I was pretty dependent on their contributions for the recognizer. I want to improve the recognizer and free myself from dependence on others for future development. (more about that project later.)
Experience / Expertise I bring to the class:
I won't claims expertise in any particular area but my skills include:

Programming Languages:        C++/C, Python, Java, Javascript, C#, PHP, ,NET, WPF, XML, XAML, Haskell, Lisp, Prolog, MySQL
Natural Languages:                  English, Spanish, Japanese

Other Skills/Experiences:
Multithreaded Programming, Windows and Unix System Programming (GUI, inter-process comm., threads, sockets), Agile Methods: Scrum, Game Development in Unity3d

Professional Life Goals:
  1. Make a memorable game, preferably fun.
  2. Develop a top-tier educational tool for learning languages.
  3. Find work in the video game industry for a time.
Personal Life Goals:
  1. To serve God and others well
  2. To love God and others fully
  3. To  live wisely
  4. To learn many languages well and visit and experience many cultures
What I want to do after I graduate:
  • Get a good job (pays reasonably with fun coworkers and fulfilling work) and earn enough to have a great wedding and honeymoon in October
What I expect to be doing in 10 years:
  • One can never be sure, but hopefully learning a new language and living out my personal and professional goals.
What do you think will be the next biggest technological advancement in computer science?
  • Proliferation of nano-machines for sensing, communication, and environment alteration (maybe that's the next after next after next advancement)
If you could travel back in time, who would you like to meet and why?
  • C.S. Lewis - The man is simply brilliant. He writes very succinctly and has a very discerning mind. He sees what everyone sees in any particular matter and then drills past assumptions and hits what is really the heart of the matter. His logical approach to theology results in balanced and sound stances on anything he approaches.
Describe your favorite shoes and why they are your favorite?
  • I suppose that I enjoyed a pair of  leather boots with a belt style buckle that looks similar to harness boots. They had a rounded toe look. They were heavy and you felt like you could command a presence without being forcibly stylish or vain.
If you could be fluent in any foreign language that you're not already fluent in, which one would it be and why?
  • Either Mandarin or German, I think. I like their history, art, food, culture, and way their languages sound.
Something Funny / Interesting:
  • I had no idea what programming was when I started studying computer science.