Saturday, April 20, 2013

A Visual Approach to Sketched Symbol Recognition


In their paper, Tom Y. Ouyang and Randall Davis present a visual recognizer that uses what the researchers called the, "Image Deformation Model" that compares each point in a given image in a 3x3 local window to a  corresponding window in a template image. Each point has values from a set of 5 features:

  1. Horizontal Orientation of Stroke at Point
    • 0 degrees
  2. "Forward Slash" Diagonal Orientation of Stroke at Point 
    • (Note: "Forward Slash" is a term that I have assigned)
    • 45 degrees
  3. Vertical Orientation of Stroke at Point
    • 90 degrees
  4. "Back Slash" Diagonal Orientation of Stroke at Point
    • (Note: "Back Slash" is a term that I have assigned)
    • 135 degrees
  5. Endpoint
    • 1.0 if the point is an endpoint of a stroke
    • 0.0 if the point is not an endpoint of a stroke
The points that are given these features values are the output points after a normalization process where the strokes are resampled, scaled, and translated for temporal, scale, and position invariance, respectively. Ouyang and Davis do their scaling after translating the center of mass to the origin (0,0). Their scaling works by constraining the scaling along a unit standard deviation outward from the center along both axes.

The actual IDM distance (D^2) computed between the input stroke to the templates is computed by the following equation:
  • D^2 = Sum for every point x and y ( min of dx or dy from ( ( || I1(x+dx, y+dy) - I2(x, y)^2 || ) ^2)
    • where dx and dy represent pixel shifts
    • where Ii(x,y) represents 3x3x5 features values in Ii from the 3x3 patch centered at point(x,y)
Ouyang and Davis perform several optimization to reduce computation time including Coarse Candidate Pruning where only the N nearest neighbors of a coarse metric are passed to be computed by the IDM model. After taking the first K principle components of the image, it uses those K components to compute the following coarse metric:
  • Dhat^2 = sum from k = 1 to K ( v1(k) - v2(k) ) ^2
    • where vi(k) is the k-th principle component of the i-th image.
The second optimization is a hierarchical clustering optimization that uses a branch and bound technique to apply agglomerative hierarchical clustering to each training example. Each class's examples are grouped on a complete-link distance and are then merges the two nearest clusters until only one cluster exists for the class. The resulting hiearchical tree is used to choose the cluster center as the representative template and uses that to find the maximum distance between the other clusters to use as the "cluster radius," rc. This radius is used to choose whether to ignore a given class of training examples by determining if the center radius, dc, minus the cluster radius, rc, is larger than the best distance the IDM comparison has found so far. If it is, the entire cluster is ignored, lower computation time. 

Rotational invariance is obtained by comparing 32 rotated versions of the input sketch to the templates that pass the hierarchical clustering optimization. 

The researchers tested their IDM based recognition on 3 datasets and achieved the following accuracy rates and rankings to other recognizers:
  1. Pen Digits, 99.2%, 2
  2. HHReco, 98.2%, 1
  3. Circuit Diagrams, 96.2%, 1
For the pen digits dataset, the algorithm ran in 8.1 ms which was the second fastest time where the fastest was 2.4 ms and the 3rd fastest was 40.8 ms.

The description of the optimization and the scaling were particularly interesting for this recognition scheme. It's also funny how well their distance recognition performs with only 5 features for a given point, 4 of which are orientation.

Reference:
Tom Y. Ouyang and Randall Davis, "A Visual Approach to Sketched Symbol Recognition", Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09), pp 1463 - 1468, Pasadena, California, USA, July, 2009

Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches


This paper was Michael Oltmans' thesis and was written in 2007 at the Massachusetts Institute of Technology. Unlike the other papers that have been presented in this blog, this approach is a vision based approach that uses a sliding window called a "bulls-eye" that is split like a darts target into concentric rings with radial splits. The outer ring's sections are larger than the inner rings (on a logarithmic scale) to represent that the points in the middle are in the "focus" / center of the fovea and more important than points that fall in the inner rings.

Oltmans makes the shapes found by the bulls-eyes rotation invariant by aligning the shape so that the stroke direction points to the right along the x-axis. In the event that the determined direction of the stroke is wrong, Oltmans also flips the stroke direction and aligns that with the x-axis. To prevent the stroke's points from falling on either side of the bullseye's horizontal bins and damaging the comparison. The bullseye is rotated so that a single wedge (bin) takes up the horizontal bin. The stroke direction itself is found using a sliding window of 19 points on which an orthogonal distance regression finds the best-fit line. Shapes in the bullseye are normalized for scale invariance so that they are 75 pixels wide and tall. The strokes within the shape are also processed so that the points in the sketch have a pixel distance of at least one,  and are also resampled.

Oltmans' recognition scheme is temporally invariant (stroke order doesn't matter) since the entire figure will be brought into focus and preserve the context of the figure.

The bulls-eye uses a histogram based scoring method to compare the part of the sketch it has in focus to other template bulls-eyes of other shapes. For this comparison, he uses the following distance metric:

  • Sum( (Qi - Pi)^2 / (Qi + Pi) ) 
    • where the Qi is the number of points in the input bulls-eye and Pi is the number of points in the template bulls-eye.
Oltman used his bulls-eyes method to find shapes in circuit diagrams that included the symbols for resisters, batteries, grounds, and current sources (see figures below).
image002pc3.jpg


To determine which figures should be focused on, i.e. which contexts/shapes to perform recognition on, Oltman moves along each stroke and adds a candidate context every 10 points along the stroke. This process is repeated for larger and larger windows to find shapes drawn at different sizes. The system then breaks down the shape by classifying some candidate regions as wires that don't match any other kind of shape.  The other candidate regions are then either combined or split into candidate regions that are processed by the bulls-eyes for final classification.  This initial set of clusters is found using an expectation maximization clusterer. The clusterer is given a vector of each candidate region's top left and bottom right corners which are weighted by the square scores of each candidate region. The clusterer outputs the mean and standard deviation of the four coordinates which are used to find a mean candidate region to be used as a new region for shape prediction. These clusters of new regions are then split if they are too large as determined by the standard deviations from the clusterer. Any remaining overlapping regions are chosen between by greedily choosing the highest weighted cluster.

Finally, predictions are made on the remaining clusters and the highest scoring clusters are chosen for any remaining overlapping clusters and the rest of the clusters are thrown away within a given set of overlapping clusters. This proceeds until all regions have been classified.

 

Reference:
Michael Oltmans, "Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches", Massachusetts Institute of Technology, 2007

Tuesday, April 9, 2013

ShortStraw: a simple and effective corner finder for polylines


In this paper, Wolin et al. discuss their ShortStraw corner finding algorithm.

Their algorithm is based on finding where "straws" connections between the endpoints in a window of points, bridge gaps.

Formally:

  •  a straw at point i = distance(point i-w, point i+w) 
    • where w = 3, an empirically determined constant for best results

After resampling (which normalizes the straw lengths), their algorithm uses a bottom-up approach to build a set of initial corners followed by a top-down approach to refine or add to the existing set of found corners.

Bottom-up stage: 

  • Compute the straws for all points
  • Find the median straw (distance)
  • Set threshold t = median * 0.95
  • Check for Additional Corners:
    • For each (Straw straw : Straw)
      • if( straw < threshold )
        • corners.add( corresponding point for straw )

Top-down stage:

  • Additive Co-linear check:
    • for each( pair of points in corners)
      • if( pair of points does not form a line [i.e. r <= 0.95])
        • Check for Additional Corners between the pair of points 
          • with relaxed threshold, t
  • Reductive Co-linear check:
    • for each( triplet of points in corners)
      • if( triplet of points forms a line [i.e. r <= 0.95] )
        • remove middle point from triplet of points
where r = distance(points,a,b) / path-distance(points,a,b)

Results
The shortstraw recognizer, when run on 244 different polyline shapes, identified 97.9% of actual corners as corners (Correct Corners Accuracy). More importantly, their recognizer achieved an "All-or-Nothing Accuracy" of 74.1% which means that the Shortstraw recognizer perfectly recognized all the corners as corners and did not mis-classify any of the non-corners as corners for 74.1% of the strokes.

This is impressive considering that the Sezgin and Kim recognizers had Correct Corner Accuracies of 82.4% and 79.0%, respectively and All-or-Nothing Accuracies of 27.8% and 29.7%.

Even better than that, the algorithm is simple and competitive in term of the computation time and resources required to run it.

It's encouraging to see simple ideas outperform the more advanced methods. Simplicity can be beautiful at times.

A. Wolin, B. Eoff, and T. Hammond. 2008. ShortStraw: a simple and effective corner finder for polylines. In Proceedings of the Fifth Eurographics conference on Sketch-Based Interfaces and Modeling (SBM'08), Christine Alvarado and Marie-Paule Cani (Eds.). Eurographics Association, Aire-la-Ville, Switzerland, Switzerland, 33-40. DOI=10.2312/SBM/SBM08/033-040 http://dx.doi.org/10.2312/SBM/SBM08/033-040

Saturday, April 6, 2013

A Domain-Independent System for Sketch Recognition

In this paper, Yu and Cai, discuss their system which was designed with the following goals:

  • Naturalness
    • It must allow users to draw in a natural way just as how they do on paper.
  • Consistency
    • It should produce a consistent recognition result,
      • i.e. if a user wants to sketch a line segment, no matter if it is drawn with a single stroke, or more than one stroke, or a compound stroke which contains other primitive elements, the system should interpret it as a line segment.
  • Hierarchical Composition 
    • It should understand the hierarchical relations among the recognized graphic objects. 
      • For example, a stroke which is composed of several primitive shapes may serve as a constituent part of some more complex object.
  • Beautification
    • It should try to guess what the user intends to draw and turn the imprecise strokes into regular objects. 
      • Although domain-specific knowledge may be very helpful here, some simple decisions can still be made according to low-level geometric features only.
  • Editing Support
    • Besides a fairly high recognition rate, it should provide an efficient and convenient user interface for drawing modification as well.
  • Modularity
    • It can be easily integrated as in input module into other more complex systems.
Their system was divided into two stages:
  1. Imprecise Stroke Approximation
    • Vertex Detection and Primitive Recognition
      • Procedure: For every stroke,  check to stroke to see if primitive. Stop if it is, else cut the stroke at the point of highest curvature (corners) and perform primitive check on these two strokes. (recursive)
    • Primitive Detection: Lines, Curves, Circles, Ellipses, Squares, Spiral (referred to as 'helixes' in the paper)
    • Discussed Topics
      • Line Segment Approximation
        • Computes the least-squares best fit line for the direction curve or coordinates of points of the stroke and whether the number of deviation points is "small" with a threshold. If it is, then guess line segment and verify with feature area verification, else, delay classification.
        • "Feature Area Verification" - checks the ratios between feature and standard area; Serves as a constraint to local criteria for line segment recognition on strokes that are based on direction and curvature information.
      • Curve Approximation
        • Do circle and ellipse approximation
          • Circle: check if direction curve can be approximated with a line with a slope of 2pi / n. If so, guess circle with center as the bounding box center and mean distance from center as the radius. Verify with feature area verification.
            • Overtracing is handled by breaking each 2pi revolution of the direction graph into separate circles and combining them by taking the mean features of the circles.
          • Ellipse: "similar to circle" with extra constraint : area is symmetrical about major and minor ellipses.
            • A dubious claim, according to Dr. Hammond
          • Arc Stroke: Direction curve is approximated by line with slope *less than* 2pi / n.
      • Self-Intersecting Stroke Handling
        • If stroke can't be broken down into primitives and it has intersecting lines, make two copies of the stroke and divide them using two strategies:
          1. At self intersection points
          2. At points of maximum curvature
        • Choose the one with the "better" recognition 
2. Post Processing
    • Eliminates elements made by noise in data, beautifies remaining elements, do domain-independent recognition.
    • Steps: 
      • Simple Relation Retrieval
      • Cleanup
      • Basic Object Recognition
Their user interface also allowed for editing by going into a "modification state" and entering editing command symbols which included "delete","undo", and "redo".

The system was evaluated with only 10 students (totally :p) who thought that the system "was easy to use".
The students entered a set of primitive and hybrid shapes.

"Correct Rate":
  • Primitive Shapes: 98%
    • Arcs: 94%
  • Hybrid Shapes: 70%
    • Smooth connections in hybrid shapes were hard to detect.

The paper's evaluation was, qualitatively and quantitatively, extremely weak. Simply saying that system was easy to use is ambiguous. How was it easy to use? What parts did they like? Did they ask if they didn't like anything? Perhaps using a likert scale questionnaire would give a more solid grasp of user satisfaction. How long did they use it for? What the evaluation so short they couldn't find anything wrong about the system?

They only tested on 10 students and no mention was made of how many shapes the users each had to enter. Their correct rate metric is a little vague and not a standard measure for recognition performance.

The paper also erroneously calls spirals "helixes" and there were many spelling and grammatical errors that distracted from the paper.
Ex: "We has devised a function..." (Can they has chzburgers?)
Ex: "There were totally 10 students..." (Did a surfer write this paper?)
Ex: "the curvature curve..." (This happened a lot in the paper and it is really redundant.)

Don't get me wrong, they had some smart ideas. Their method of recognizing overtraced circles was very clear and concise  and their method of breaking strokes down at high curvature vertices makes perfect sense.

Bo Yu and Shijie Cai. 2003. A domain-independent system for sketch recognition. InProceedings of the 1st international conference on Computer graphics and interactive techniques in Australasia and South East Asia (GRAPHITE '03). ACM, New York, NY, USA, 141-146. DOI=10.1145/604471.604499 http://doi.acm.org/10.1145/604471.604499

Friday, April 5, 2013

Mechanix : Sketch Recognition Algorithms for Comparing Complex and Unpredictable Shapes


In this paper, Field et al. discuss their software platform called, "Mechanix" which is designed to help teachers publish and grade course material that students submit to online in the form of sketches.

The main subject domain that their recognition system is used for is in the recognition of free body diagrams, primarily trusses, for engineering courses at Texas A&M (whoop!).

Their recognizer uses a single example provided by a professor, teacher, or TA as the training data that the student's input data (truss drawings) are compared to for grading. Rather than give a confidence or break down the results, the recognizer simply returns a correct or not correct response.

Body Identification
The free body input is also checked to see if the bodies are indeed closed shapes. This is done by traversing the stroke segments' endpoints and finding the closest endpoints to check if those endpoints are within 9% of the total path length to each other. After all end points are checked, the start and end point are checked to see if they are close enough. If all conditions pass, then the shape is closed shape

Body Comparison - Tanimoto + Hausdorff
Field et al.'s algorithm works by preprocessing the stroke for temporal invariance, rotational invariance, scale invariance, and position invariance similar to the preprocessing in $1.

Then, the Tanimoto coefficient, Hausdorff, and Modified Hausdorff distances are computed and combined to give a "matching" or "not matching" response if the confidence is greater than the acceptance threshold of 65%.

Hausdorff Distance:
  • Haudorff Distance(A, B) = max(max(Da),max(Db))
    • where Da = min (distance from a to b)
    •            Db = min (distance form b to a )
      • where a and b are points in point vectors Pa and Pb from the input (A) and training (B) points, respectively
  • Modified Hausdorff(A,B) = Va + Vb / 2
    • where Va = avg(Da)
    •           Vb = avg(Db)

Tanimoto Coefficient:
  • Normally: T(A,B) = numPts ( A & B ) /  numPts(A) + numPts(B) - numPts(A & B)
  • Mechanix: T(A,B) = numPts(A & B) + numPts( B & A) / numPts(A) + numPts(B).

Truss Identification
Trusses are identified by finding intersection that share edges with 3 three distinct paths connecting them.

Field et al. achieved recall rates of 0.991 and a precision of 0.691.
While the precision seems low, but the high recall rates is what is really important since the recognizer needs to be able to verify correctness of the sketches and recognize correct trusses as such.

The high recognition rates for general symbols with Tanimoto and Hausdorff is impressive and its great to see a strong practical application of sketch recognition towards education. Knowing what's important for the domain was a strong point of the paper.

Martin Field, Stephanie Valentine, Julie Linsey, and Tracy Hammond. 2011. Sketch recognition algorithms for comparing complex and unpredictable shapes. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three(IJCAI'11), Toby Walsh (Ed.), Vol. Volume Three. AAAI Press 2436-2441. DOI=10.5591/978-1-57735-516-8/IJCAI11-406 http://dx.doi.org/10.5591/978-1-57735-516-8/IJCAI11-406