Wednesday morning, 9.00-13.00


The code used during this assignment is available in the directory
~/cg/Course99/Wednesday
This directory is not writtable, so you should copy the relevant code in
your own account or use symbolic links.


I. Kernel regression
====================

Copies of the slides for yesterday's lecture are in Tuesday/Kernel.ps.gz

The files needed are:
kregest: kernel regression estimation (adaptive metric),
kercvq: cross-validation estimator, using a quadratic parameterisation
        for the metric parameters,
kercve: cross-validation estimator, using an exponential parameterisation
        for the metric parameters,
kcv: kernel cross-validation estimator,

akrtrain: training of the adaptive metric kernel regression.

Look at the description of the code ('help kregest' and so on) for more
information.


I.1. S1 dataset.

Training and test data are in trains1 and tests1.

Estimate the metric parameters by using conjugate gradient on the
cross-validation cost, eg:

>> H0 = 1 ./ std(X)' ;
>> H = conjgrad(H0, 'kercvq', 1e-8, X, Y, 1) ;

where X and Y are the training inputs and outputs, and 1 is for leave-1-out
cross-validation.
Alternatively use akrtrain:

>> H = akrtrain(X, Y) ;

The equivalent kernel standard deviation is:

>> 1./sqrt(H.^2)

Estimate the training error and test error. The test data is noise-free so
0 is the best possible error.
!!: kregest uses the (positive) metric coefficients, so you will have to
    pass H.^2 or exp(H) as last parameter, depending on whether you trained
    using kercvq or kercve.

You can look at the way the model evolves during learning using:

>> akrs1

Try to change the split-ratio, ie how many points are left-out for each
fold during cross-validation:

>> akrs1(2)       % For leaving 2 out  (61 folds)
>> akrs1(4)       % For leaving 4 out  (31 folds)

What's your conclusion here ?


I.2. 10-dimensional dataset

Estimate your model on each dataset size by using either conjgrad or
akrtrain.

>> H50 = akrtrain(train50(:,1:10), train50(:,11))

and so on.

Look at the metric coefficients H50.^2, H100.^2 and H200.^2 by plotting
them on the same set of axes. What's your conclusion ?
(Remember that low metric coefficients indicate irrelevant dimensions).

Now try the exponential parameterisation: minimise kercve using conjgrad or
copy akrtrain and modify it appropriately. Estimate your model on each
training set size and compare the resulting parameters (exp(H)) with what
you got earlier. What do you see ?

You can see the evolution of the training and test error and metric
parameters by using akrexp:

>> akrexp(train200(:,1:10), train200(:,11), ...
          test10000(1:1000,1:10), test10000(1:1000,11))

Note: using the entire test set can make the test error calculation rather
      long. Using a tousand or a few thousand elements is more sensible.


Finally try different split ratios. Eg do 10-fold cross-validation with:

>> akrexp(train200(:,1:10), train200(:,11), ...
          test10000(1:10000,1:10),test10000(1:1000,11), 10)


I.3. Further topics

Functions s1.m and generate.m (from the Data directory) can be used to
generate datasets similar to the ones used above. It is therefore possible
to estimate relatively well the distributional properties of various
quantities by generating several training sets. For example:

 - The distribution of the metric parameters,
 - The average (and distribution) of the generalisation error,
 - The effect of different split ratios.


II. Mixture of experts
======================

There are three files to try in this directory

demomog.m	Mixture of Gaussians demo
demomoe.m	Mixture of linear models demo
demorbf.m	Radial Basis Function net demo


Further instructions are in the comments of these files.