In their paper, Tom Y. Ouyang and Randall Davis present a visual recognizer that uses what the researchers called the, "Image Deformation Model" that compares each point in a given image in a 3x3 local window to a corresponding window in a template image. Each point has values from a set of 5 features:
- Horizontal Orientation of Stroke at Point
- 0 degrees
- "Forward Slash" Diagonal Orientation of Stroke at Point
- (Note: "Forward Slash" is a term that I have assigned)
- 45 degrees
- Vertical Orientation of Stroke at Point
- 90 degrees
- "Back Slash" Diagonal Orientation of Stroke at Point
- (Note: "Back Slash" is a term that I have assigned)
- 135 degrees
- Endpoint
- 1.0 if the point is an endpoint of a stroke
- 0.0 if the point is not an endpoint of a stroke
The points that are given these features values are the output points after a normalization process where the strokes are resampled, scaled, and translated for temporal, scale, and position invariance, respectively. Ouyang and Davis do their scaling after translating the center of mass to the origin (0,0). Their scaling works by constraining the scaling along a unit standard deviation outward from the center along both axes.
The actual IDM distance (D^2) computed between the input stroke to the templates is computed by the following equation:
- D^2 = Sum for every point x and y ( min of dx or dy from ( ( || I1(x+dx, y+dy) - I2(x, y)^2 || ) ^2)
- where dx and dy represent pixel shifts
- where Ii(x,y) represents 3x3x5 features values in Ii from the 3x3 patch centered at point(x,y)
Ouyang and Davis perform several optimization to reduce computation time including Coarse Candidate Pruning where only the N nearest neighbors of a coarse metric are passed to be computed by the IDM model. After taking the first K principle components of the image, it uses those K components to compute the following coarse metric:
- Dhat^2 = sum from k = 1 to K ( v1(k) - v2(k) ) ^2
- where vi(k) is the k-th principle component of the i-th image.
The second optimization is a hierarchical clustering optimization that uses a branch and bound technique to apply agglomerative hierarchical clustering to each training example. Each class's examples are grouped on a complete-link distance and are then merges the two nearest clusters until only one cluster exists for the class. The resulting hiearchical tree is used to choose the cluster center as the representative template and uses that to find the maximum distance between the other clusters to use as the "cluster radius," rc. This radius is used to choose whether to ignore a given class of training examples by determining if the center radius, dc, minus the cluster radius, rc, is larger than the best distance the IDM comparison has found so far. If it is, the entire cluster is ignored, lower computation time.
Rotational invariance is obtained by comparing 32 rotated versions of the input sketch to the templates that pass the hierarchical clustering optimization.
The researchers tested their IDM based recognition on 3 datasets and achieved the following accuracy rates and rankings to other recognizers:
- Pen Digits, 99.2%, 2
- HHReco, 98.2%, 1
- Circuit Diagrams, 96.2%, 1
For the pen digits dataset, the algorithm ran in 8.1 ms which was the second fastest time where the fastest was 2.4 ms and the 3rd fastest was 40.8 ms.
The description of the optimization and the scaling were particularly interesting for this recognition scheme. It's also funny how well their distance recognition performs with only 5 features for a given point, 4 of which are orientation.
Reference:
Tom Y. Ouyang and Randall Davis, "A Visual Approach to Sketched Symbol Recognition", Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09), pp 1463 - 1468, Pasadena, California, USA, July, 2009
The description of the optimization and the scaling were particularly interesting for this recognition scheme. It's also funny how well their distance recognition performs with only 5 features for a given point, 4 of which are orientation.
Reference:
Tom Y. Ouyang and Randall Davis, "A Visual Approach to Sketched Symbol Recognition", Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09), pp 1463 - 1468, Pasadena, California, USA, July, 2009