Home | Coding | Digit Recognition | Unsupervised Learning | Supervised Learning | Conclusions | References

Digit Recognition using Kernel-based Subspace Segmentation

Shankar Rao
ECE586YM Final Project [presentation] [code]

In this project, we investigate applying the entropy coding techniques used in subspace segmentation to the problem of segmenting data drawn from multiple nonlinear manifolds. In particular, we exploit the so-called "kernel trick" popularized in Kernel Principal Component Analysis. We use the problem of digit recognition as a testbed application for our methods as it is a well-studied problem with readily available training and testing data sets.

The "Kernel Trick" and Subspace Entropy Coding

From the Entropy Coding algorithm discussed in [1], we know that the total number of bits needed to optimally code the m vectors in W in an n-dimensional space is given by:

Notice that the determinant is a function of the eigenvalues of the matrix . This matrix has the same eigenvalues as the matrix . The matrix is composed of all the pairwise inner products between vectors in W. Thus we will obtain computational savings in applications where m < n. Also, if we are not given the vectors themselves, but only their set of inner products, we may still perform the subspace segmentation. However, the most surprising consequence is that we can replace this notion of inner product with any positive symmetric function. A matrix K constructed from such a function is called a kernel matrix, and it corresponds to an inner product between the vectors under some high (possibly infinite) dimensional nonlinear mapping. For examples of kernel functions please refer to [2].

Home

Digit Recognition

The MNIST database is a collection of 60000 labelled training digits and 10000 labelled testing digits. Each digit is a 28 x 28 grayscale image , and the digit that has been size normalized and centered within the image. However, there is still large variation between digits of the same class. For example, there are at least two prominent ways of writing the number "2":

There are many other digit variations, including rotations, slants, and distortion, and as a result of these variations, the digits from a single class lie on a nonlinear manifold. This motivates the use of nonlinear segmentation techniques. However, to improve segmentation, we can preprocess the digits to reduce much of this variation. One popular and effective technique for reducing digit variation is known as slant correction. In this approach, linear regression is used to best estimate the slope of the slant line, and the image coordinates are transformed to make the line vertical [3]. Below is results of slant correction on a training image.

Home

Unsupervised Clustering

To first examine whether kernel functions will be useful for digit recognition, we tested out the unsupervised case. That is, we took one hundred samples of each digit from the MNIST database, performed slant correction, and then applied the Pairwise Steepest Descent Algorithm with different kernel functions and various choices of epsilon. In order to compare the performance between these methods, we define a sample as "misclassified" if its apriori label (from the MNIST database) does not agree with the majority of the samples in the same group. This is a reasonable heuristic estimate, but it is by no means perfect (e.g., the initial segmentation of each sample belonging to its own group has no "misclassified" samples). The results of these experiments are below.

Kernel function: Polynomials of degree 1 (Affine subspace segmentation)

Kernel function: Polynomials of degree 3

Kernel function: Radial basis function with sigma^2 = 300000

These experiments would seem to indicate that we should use a 3rd degree polynomial kernel with epsilon=0.1. However, this kernel results in seventy-nine groups, and many of these groups are not meaningful, such as this one:

Thus, despite its higher "misclassification" rate, we use the radial basis kernel function for the supervised learning experiments.

Home

Supervised Learning

In order to performed supervised learning, we start with two hundred training samples from each digit class. Then for each test sample, we assign it to the class that would result in the smallest encoding for the sample. One quirk of this method is that the epsilon chosen from the unsupervised experiments was far too small, resulting in every sample being assigned to the first group. After some experimentation, we discovered that the radial basis kernel function with sigma^2=300000 and epsilon = 0.3 produces reasonable results. Below is the confusion matrix for a supervised experiment (a 95.05% classification rate):

Class
0 1 2 3 4 5 6 7 8 9 Errors
0
199
1
1
1
197
2
1
3
2
189
1
1
1
2
6
0
11
3
1
189
2
4
2
2
11
4
190
4
1
5
10
5
1
3
194
1
1
6
6
4
1
1
3
191
9
7
1
4
1
186
8
14
8
6
2
2
3
1
1
1
183
1
17
9
7
4
1
2
3
183
17
11
2
7
10
10
7
9
12
11
17
99

Home

Conclusions

This experiment brought to light many of the issues inherent with using nonlinear kernels in subspace segmentation. Both the choice of the kernel function and its parameters change the meaning of the error tolerance epsilon. The radial basis kernel function was able to obtain reasonable segmentation of the testing digits using only a small subset of the training data and no structural or shape information. I believe that the technique can be improved by including some shape information, through the use of shape features such as Fourier descriptors. I also think that using the kernel function results in the data samples only sparsely covering their given manifold. It may improve segmentation to use more than just the pairwise steepest descent algorithm.

Home

References

[1] Ma, Y., Derksen H., Hong, W., and Wright, J. On the coding and segmentation of multivariate mixed data. UIUC Technial Report UILU-ENG-06-2203. April 2006.
[2] Scholkopf, B., Smola A., and Muller K. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation 1998.
[3] Liu, Z., Cai, J., and Buse, R. Handwriting Recognition. Springer 2003.

Home