Tuesday morning, 9.00-13.00 The code used during this assignment is available in the directory ~/cg/Course99/Tuesday This directory is not writtable, so you should copy the relevant code in your own account or use symbolic links. I. Numerical optimisation methods. ================================== This part of the assignment uses function TEST2D. The 2D contour can be plotted using PLOTCOST, eg: >> plotcost('test2d', -1:.05:1.3, -1:.05:1) The minimum is in [1; 0] where the function value is 0. The available optimisation methods are: SDEXACT Steepest descent (exact line search) SDPATH Steepest descent (approximate line search) NMEXACT Newton method (exact line search) NMPATH Newton method (approximate line search) CGEXACT Conjugate gradient (exact line search) CDPATH Conjugate gradient (approx. line search) The initial point will be X0 = -[.7;.5]. Try steepest descent first: >> [M, fM, nev] = sdexact(X0, 'test2d', 500) ; then plot the results with : >> clf >> plotcost('test2d', -1:.05:1.3, -1:.05:1, [X0, M]) and the target with : >> plot(1, 0, '+') Vector 'fM' contain the function value at each iteration, while 'nev' contains the corresponding number of function evaluation: >> figure(2) >> semilogy(nev, fM) or >> loglog(nev, fM) Now try with other methods and compare the results: - in terms of number of iteration, - in terms of function evaluation, !!: the conjugate gradient and newton methods will need much less than 500 iteration to converge. Try around 40-45 first. II. Neural networks =================== This part of the assignment uses the following functions: NNINIT Neural network initialisation, NNREG Neural network regression, NNCOST Quadratic cost for the neural network model, NNCOSTR Regularised quadratic cost for the NN model. More information on these function in the on-line help ('help nninit', etc) II.1. The S1 dataset Load the data >> load trains1 >> load tests1 The data is 2-dimensional. You can plot the noise-free test data and the noisy training data by: >> Min = min(tests1(:,1)) ; >> axx = (Min:(-2*Min/64):(-Min)) ; >> surf(axx, axx, reshape(tests1(:,3), 65, 65)) ; >> hold on, plot3(trains1(:,1), trains1(:,2), trains1(:,3), '+') Initialise a neural network model: >> W0 = nninit(2, 5) ; Train the neural network using conjgrad to minimise the (unregularised) quadratic cost: >> W = conjgrad(W0, 'nncost', 1e-6, trains1(:,1:2), trains1(:,3)) ; The model estimation on the test set is obtained using nnreg >> Yhat = nnreg(tests1(:,1:2), W) ; >> surf(axx, axx, reshape(Yhat, 65, 65)) ; You can also calculate the test error: >> mean((Yhat - tests1(:,3)).^2) Let's increase the network size to 15 hidden units: this gives 61 parameters, compared to 121 data. Train the network. There is an upper limit (500) on the number of conjugate gradient iterations so you might have to restart the minimisation. Plot the data and calculate the test error: what's your conclusion ? Let's now try to add some regularisation. >> W1 = conjgrad(W0, 'nncostr', 1e-6, X, Y, 1e-1) ; >> E(1) = mean((nnreg(Xt, W1) - Yt).^2) ; where X and Y are the training input/output and Xt/Yt the test input/output. What can you see on the plot showing the network weights ? Do the same thing for 1e-2 (W2 and E(2)), 1e-3 (W3 and E(3)) ... until 1e-5. To make it easier, you can use the function nnexp: >> W2 = nnexp(X, Y, Xt, Yt, 1e-2) ; Look at the plot showing the network weights and compare what you obtain for low and high levels of regularisation. Plot the values in E: what's your conclusion ? You will also observe overfitting on the test error curves for low values of the regularisation parameter. Look at the regression surface for different values of the regularisation parameter and see how it evolves. >> surf(axx, axx, reshape(nnreg(tests1(:,1:2), W1), 65, 65)) ; What is the second effect of regularisation ? Plot the absolute values of the weights of the best model in semilog scale: >> semilogy(abs(W3), '+') What do you see ? This little exercise showed you that you can optimise a neural network by at least two methods: 1) by modifying the structure of the model using increasing amounts of hidden units and 2) by constraining a large network using regularisation. II.2. The 10-dimensional dataset There are 3 dataset of sizes 50, 100 and 200, and a large test set (10000 points) >> load train50 >> load train100 >> load train200 >> load test10000 For each training set size, train first a non regularised network. In order to optimise the performance, we will here use a validation set, containing for example 1/3 of the data (15, 33 and 66 points). The training set (without the validation points) is used for... training the networks and the validation set will be used for checking the generalisation abilities. You will now try to optimise your model using on of the above method: either architecture selection, or (preferably) using regularisation. II.3 Further topics You can use different regularisation parameters for different parts of the network: one for the input weights and one for the output weights, or even one for the weight going from each input and one per input (for the weights connected to this input) and one for the output layer. This is automatically checked by nncostr.m. Try the following: [Both datasets] Optimise the input and output layer regularisation parameters by calculating the test error on a 2D grid of points (Lbd1, lbd2). [S1 dataset] Set the output layer reg. parameter to 0 and optimise the regularisation parameters associated to each input by the same method. (pass parameters [lbd1; lbd2, 0] to NNCOSTR).