Comparison of kPCA and PCA

Let's compare PCA and kPCA on the digits data from the textbook.

Let's verify that we can move between the covariance space $X'X$ and the neighborhood space $XX'$ to obtain low rank representations of the data.

Yes, we can get the loadings from either the covariance between features or the distance between observations (with rescaling and projections).

Let's look at just the 0s and see what PCA picks up.

PC1 and PC2 picks up different aspects of the 0 digits.

Run this a couple of times.

Let's try kernel PCA and see how that separates the data.

Let's now apply to the full digits data set.

Notice how your kernel and bandwidth choices changes what is captures by the first kPCA components! It really depends on what you decide is similar and not - see how the zero or one digits are either "compressed" or "blown up".

You can look at more kPCA components of course.

Try this on some of the other data sets from class and see what happens!