Advanced methods for non-parametric modelling

Datasets used during the course

All the datasets below are in ascii, the last column is the output and the first (N-1) columns are the inputs.

S1 dataset:

This is the two-dimensional dataset we used during the course. The training data have additive noise, the testing data have no noise.
trains1: training set (121 points) sampled on a 11x11 grid,
trains2: dataset (120 points) randomly sampled from 2 Gaussians (ie has little data in the "corners" of the domain),
tests1: test set (4225 points) sampled on a 65x65 grid,
s1.m: Matlab function to generate the data.

The data has been normalised so that trains1 has mean 0 and variance 1. This means that tests1 has mean almost 0 and variance almost 1. Dataset trains2 is not normalised but should not be changed in order to compare results against trains1.

10-dimensional dataset:

This is a 10 dimensional dataset borrowed from Friedman's article "Multivariate Adaptive Regression Splines". The first two input have a non-linear, correlated effect, the third input has a non-linear additive effect, the next two inputs have an additive linear effect, and the last 5 are noise, so they should be removed by any "smart" modelling method.
train50: 50 training points,
train100: 100 training points,
train200: 200 training points,
test10000: 10000 test points,
generate.m: Matlab function generating the data.

Here the data is not normalised.

Matlab functions used during the course

Please note that not all the functions used during the course are made available here. The functions selected underneath are the ones susceptible to be usefull on real modelling tasks.

Minimisation:

conjgrad.m: Conjugate gradient minimisation with approximate line search,
test2d.m: The small 2D example used to demonstrate minimisation methods,
grosenbrock.m: The Rosenbrock example generalised to arbitrary dimensions.

Neural networks:

Warning: some names might interfere with the Matlab Neural Network toolbox, so check that these are found earlier by the path command or rename them.
nninit.m: Neural network (random) initialisation,
nnreg.m: Neural network regression estimation,
nncost.m: Cost function used for (unregularised) training,
nncostr.m: Cost function and regularisation used for training,
nnexp.m: Runs a neural network training experiment.

Kernel regression:

The functions below implement the Gaussian kernel shape, but the use of another kernel shape is straightforward.
kregest.m: Kernel regression estimation,
kercvq.m: Leave-one-out cross-validation estimator with a quadratic parameterisation of the metric diagonal,
kercve.m: Leave-one-out cross-validation estimator with an exponential parameterisation of the metric diagonal,
kcv.m: Leave-one-out cross-validation estimator (used by the previous two).
akrtrain.m: Training of the adaptive metric kernel smoother.

Mixture models

Download the tar-file mixtar, extract the files using 'tar xvf mixtar' and see the README file.

Monte Carlo Markov Chain

Radford Neal's software for flexible Bayesian modeling is publicly available and should run under most Unix systems (works fine under Linux).

Gaussian Processes:

The following function allow to estimate a GP model using the Maximum A Posteriori approach. The fully Bayesian, MCMC approach is implemented in Radford Neal's software for flexible Bayesian modeling.
gpprior.m: Sample from a Gaussian Process prior,
gp1pred.m: Prediction based on a given set of hyper-parameters, training inputs and targets, and test inputs,
gp1.m: Minus log-likelihood for a given set of hyper-parameters.

Text of the assignments

Not all the assignments can be completed from the functions available above, as all "toy functions" are not available.
Tuesday: Numerical optimisation and neural networks,
Wednesday: Kernel regression, Gaussian mixtures, mixture of experts and RBF,
Thursday: Bayesian learning for neural networks (using MCMC),
Friday: Gaussian Processes.

If you access this page through frames, you can click or copy the following link in your browser http://eivind.imm.dtu.dk/teaching/NonParReg/datafun.html