In this assignment, we will investigate regression using Gaussian
Processes. A couple of routines have been written to get you started:

gpprior.m    A few lines of code to help you draw functions from Gaussian
             Process priors.

gp1.m        Compute minus log likelihood and its derivatives with respect to
             hyperparameters for a Gaussian Process for regression.

gp1pred.m    Compute predictions based on hyperparameters, training inputs and 
             targets and test inputs.

--

Examine the code in "gpprior.m", and execute. Experiment with the
behaviour when the hyperparameters and noise level changes. Visualise
a few functions drawn from non-Gaussian covariance functions. Eg try
reducing the power of the distance from 2 towards 0; what happens to
the functions? What happens when the power is outside the 0-2 interval?
Try writing the covariance corresponding to an additive function.

--

Train a Gaussian Process on the well-known trains1 dataset, using the
conjgrad program in conjuction with gp1. Compute predictions for the
test cases using the gp1pred program; make a surface plot of the
function. What is the predictive error (compare to net-mc and other
methods). Make a surface plot of the uncertaincies in the predictions,
and explain the look of the graph.

Repeat the above, using instead the training set trains2. How do the
hyperparameters compare? How do the plots compare. Explain the look of
the surface plot of uncertainties.

--

Use the Gaussian Process to make predictions for the 10 dimensional
datasets, using train50, train100, train200. Remember to rescale the
data (inputs and outputs) to have zero mean and unit variance before
you feed them to the Gaussian Process (the functions "mean" and "std"
may be useful here). Compare perfomance (don't forget to "undo" the
output transfomations here) to the other models that we have
investigated. Does the model faithfully recover which inputs are
important, and which are not, as well as the division in non-linear
and linear contributions?

--

Make yourself a copy of gp1.m and modify the code to handle additive
models with 2 (or 3...) additive components. Verify, that you can get
this model ro reveal even more of the structure in the 10 dimensional
data set.

--

Write a matlab program to implement Hybrid Monte Carlo for the
additive model that you have just created. It may be useful to look in
MacKays MCMC introduction, where a matlab skeleton is already
given. You may want to introduce (vague) priors on all the
hyperparameters. Does the new approach work better than the "Maximum
Likelihood for Hyperparameters" approach. Explain!