Kernel Ridge Regression demo

This is taken from the jupyter notebook by Dino Sejdinovic, Department of Statistics, Oxford. Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml19/

Some of the topics in that course we cover here also. The level is roughly the same.

Let's generate a function that is nonlinear in $X$.

We will use the 'kernlab' library and fast CV package to fit kernel ridge regression with a Gaussian kernel.

Important The package CVST/kernlab we use below uses the convention$ $$k(x,y)=e^(−\sigma∥x−x′∥^2)$$ for the Gaussian RBF kernel.

In the lecture notes we used the $\sigma$ in the denominator so you will see the opposite effect of the bandwidth here. Be careful to check this for packages you use!

So, here larger values of $\sigma$ correspond to rougher kernels. First let us keep $\sigma$ fixed and vary the regularisation parameter $\lambda$ . The blue lines are fitted regression functions.

Note how very small $\lambda$ leads to overfitting (not enough regularisation), while very large $\lambda$ leads to underfitting (too much regularisation).

Let's fix the regularization and now explore the bandwidth (or inverse bandwidth since the $\sigma$ appears in the numerator in the kernel expressionabove). That is, very small $\sigma$ leads to to underfitting (kernel too smooth), while very large $\sigma$ leads to overfitting (kernel too rough).

Let's use cross-validation to select the optimal $\lambda$ and $\sigma$.

To try at home

Also investigate the stability of the CV selection as a function of e.g. sample size. What if you use no regularization? What happens?