Tuesday morning, 9.00-13.00


The code used during this assignment is available in the directory
~/cg/Course99/Tuesday
This directory is not writtable, so you should copy the relevant code in
your own account or use symbolic links.


I. Numerical optimisation methods.
==================================

This part of the assignment uses function TEST2D.
The 2D contour can be plotted using PLOTCOST, eg:

>> plotcost('test2d', -1:.05:1.3, -1:.05:1)

The minimum is in [1; 0] where the function value is 0.
The available optimisation methods are:

SDEXACT          Steepest descent (exact line search)
SDPATH           Steepest descent (approximate line search)
NMEXACT          Newton method (exact line search)
NMPATH           Newton method (approximate line search)
CGEXACT          Conjugate gradient (exact line search)
CDPATH           Conjugate gradient (approx. line search)

The initial point will be X0 = -[.7;.5].

Try steepest descent first:

>> [M, fM, nev] = sdexact(X0, 'test2d', 500) ;

then plot the results with :

>> clf
>> plotcost('test2d', -1:.05:1.3, -1:.05:1, [X0, M])

and the target with :

>> plot(1, 0, '+')

Vector 'fM' contain the function value at each iteration, while 'nev'
contains the corresponding number of function evaluation:

>> figure(2)
>> semilogy(nev, fM)
or
>> loglog(nev, fM)

Now try with other methods and compare the results:
- in terms of number of iteration,
- in terms of function evaluation,

!!: the conjugate gradient and newton methods will need much less than
    500 iteration to converge. Try around 40-45 first.


II. Neural networks
===================

This part of the assignment uses the following functions:

NNINIT        Neural network initialisation,
NNREG         Neural network regression,
NNCOST        Quadratic cost for the neural network model,
NNCOSTR       Regularised quadratic cost for the NN model.

More information on these function in the on-line help ('help nninit', etc)


II.1. The S1 dataset

Load the data
>> load trains1
>> load tests1

The data is 2-dimensional. You can plot the noise-free test data and the noisy training data by:

>> Min = min(tests1(:,1)) ;
>> axx = (Min:(-2*Min/64):(-Min)) ;
>> surf(axx, axx, reshape(tests1(:,3), 65, 65)) ;
>> hold on, plot3(trains1(:,1), trains1(:,2), trains1(:,3), '+')

Initialise a neural network model:

>> W0 = nninit(2, 5) ;

Train the neural network using conjgrad to minimise the (unregularised)
quadratic cost:

>> W = conjgrad(W0, 'nncost', 1e-6, trains1(:,1:2), trains1(:,3)) ;

The model estimation on the test set is obtained using nnreg

>> Yhat = nnreg(tests1(:,1:2), W) ;
>> surf(axx, axx, reshape(Yhat, 65, 65)) ;

You can also calculate the test error:

>> mean((Yhat - tests1(:,3)).^2)

Let's increase the network size to 15 hidden units: this gives 61 parameters,
compared to 121 data. Train the network. There is an upper limit (500) on the
number of conjugate gradient iterations so you might have to restart the
minimisation.
Plot the data and calculate the test error: what's your conclusion ?

Let's now try to add some regularisation.

>> W1 = conjgrad(W0, 'nncostr', 1e-6, X, Y, 1e-1) ;
>> E(1) = mean((nnreg(Xt, W1) - Yt).^2) ;

where X and Y are the training input/output and Xt/Yt the test input/output.
What can you see on the plot showing the network weights ?
Do the same thing for 1e-2 (W2 and E(2)), 1e-3 (W3 and E(3)) ... until 1e-5.
To make it easier, you can use the function nnexp:

>> W2 = nnexp(X, Y, Xt, Yt, 1e-2) ;

Look at the plot showing the network weights and compare what you obtain
for low and high levels of regularisation.

Plot the values in E: what's your conclusion ?
You will also observe overfitting on the test error curves for low values
of the regularisation parameter.
Look at the regression surface for different values of the regularisation
parameter and see how it evolves.

>> surf(axx, axx, reshape(nnreg(tests1(:,1:2), W1), 65, 65)) ;

What is the second effect of regularisation ?

Plot the absolute values of the weights of the best model in semilog scale:

>> semilogy(abs(W3), '+')

What do you see ?

This little exercise showed you that you can optimise a neural network by
at least two methods: 1) by modifying the structure of the model using
increasing amounts of hidden units and 2) by constraining a large network
using regularisation.


II.2. The 10-dimensional dataset

There are 3 dataset of sizes 50, 100 and 200, and a large test set
(10000 points)

>> load train50
>> load train100
>> load train200
>> load test10000

For each training set size, train first a non regularised network.
In order to optimise the performance, we will here use a validation set,
containing for example 1/3 of the data (15, 33 and 66 points).
The training set (without the validation points) is used for... training
the networks and the validation set will be used for checking the
generalisation abilities.

You will now try to optimise your model using on of the above method:
either architecture selection, or (preferably) using regularisation.


II.3 Further topics

You can use different regularisation parameters for different parts of the
network: one for the input weights and one for the output weights, or even
one for the weight going from each input and one per input (for the weights
connected to this input) and one for the output layer. This is automatically
checked by nncostr.m.

Try the following:
  [Both datasets] Optimise the input and output layer regularisation
        parameters by calculating the test error on a 2D grid of points
        (Lbd1, lbd2).
  [S1 dataset] Set the output layer reg. parameter to 0 and optimise the
        regularisation parameters associated to each input by the same method.
        (pass parameters [lbd1; lbd2, 0] to NNCOSTR).