Wednesday morning, 9.00-13.00 The code used during this assignment is available in the directory ~/cg/Course99/Wednesday This directory is not writtable, so you should copy the relevant code in your own account or use symbolic links. I. Kernel regression ==================== Copies of the slides for yesterday's lecture are in Tuesday/Kernel.ps.gz The files needed are: kregest: kernel regression estimation (adaptive metric), kercvq: cross-validation estimator, using a quadratic parameterisation for the metric parameters, kercve: cross-validation estimator, using an exponential parameterisation for the metric parameters, kcv: kernel cross-validation estimator, akrtrain: training of the adaptive metric kernel regression. Look at the description of the code ('help kregest' and so on) for more information. I.1. S1 dataset. Training and test data are in trains1 and tests1. Estimate the metric parameters by using conjugate gradient on the cross-validation cost, eg: >> H0 = 1 ./ std(X)' ; >> H = conjgrad(H0, 'kercvq', 1e-8, X, Y, 1) ; where X and Y are the training inputs and outputs, and 1 is for leave-1-out cross-validation. Alternatively use akrtrain: >> H = akrtrain(X, Y) ; The equivalent kernel standard deviation is: >> 1./sqrt(H.^2) Estimate the training error and test error. The test data is noise-free so 0 is the best possible error. !!: kregest uses the (positive) metric coefficients, so you will have to pass H.^2 or exp(H) as last parameter, depending on whether you trained using kercvq or kercve. You can look at the way the model evolves during learning using: >> akrs1 Try to change the split-ratio, ie how many points are left-out for each fold during cross-validation: >> akrs1(2) % For leaving 2 out (61 folds) >> akrs1(4) % For leaving 4 out (31 folds) What's your conclusion here ? I.2. 10-dimensional dataset Estimate your model on each dataset size by using either conjgrad or akrtrain. >> H50 = akrtrain(train50(:,1:10), train50(:,11)) and so on. Look at the metric coefficients H50.^2, H100.^2 and H200.^2 by plotting them on the same set of axes. What's your conclusion ? (Remember that low metric coefficients indicate irrelevant dimensions). Now try the exponential parameterisation: minimise kercve using conjgrad or copy akrtrain and modify it appropriately. Estimate your model on each training set size and compare the resulting parameters (exp(H)) with what you got earlier. What do you see ? You can see the evolution of the training and test error and metric parameters by using akrexp: >> akrexp(train200(:,1:10), train200(:,11), ... test10000(1:1000,1:10), test10000(1:1000,11)) Note: using the entire test set can make the test error calculation rather long. Using a tousand or a few thousand elements is more sensible. Finally try different split ratios. Eg do 10-fold cross-validation with: >> akrexp(train200(:,1:10), train200(:,11), ... test10000(1:10000,1:10),test10000(1:1000,11), 10) I.3. Further topics Functions s1.m and generate.m (from the Data directory) can be used to generate datasets similar to the ones used above. It is therefore possible to estimate relatively well the distributional properties of various quantities by generating several training sets. For example: - The distribution of the metric parameters, - The average (and distribution) of the generalisation error, - The effect of different split ratios. II. Mixture of experts ====================== There are three files to try in this directory demomog.m Mixture of Gaussians demo demomoe.m Mixture of linear models demo demorbf.m Radial Basis Function net demo Further instructions are in the comments of these files.