In this assignment we will introduce Radford Neal's software for Bayesian Learning in neural networks by some hands-on experiments. This software package is fairly extensive, and explaining the full functionality of all the programs is not possible in short time; instead we will start from common script and try to understand what is going on, and see what happens when various interesting options are altered. The documentation is most easily read with a browser, try "netscape file:/usr/share/fbm/doc/index.html &" to get started. -- Part I: The basic script which we will use to train the network contains the following 8 lines of code: -- net-spec log 2 8 1 / - x0.2:0.5:1 0.1:0.5 - x0.05:0.5 x0.2:0.5:1 1 model-spec log real 0.05:0.5 data-spec log 2 1 / trains1 . tests1 . / +@x@ +@x@ / +@x@ net-gen log fix 0.5 mc-spec log repeat 10 sample-noise heatbath hybrid 100:10 0.4 net-mc log 1 mc-spec log sample-sigmas heatbath hybrid 1000:10 0.4 net-mc log 100 -- You may wish to copy these lines to a file for easy reference (you may need to make this script file executable using "chmod 755 file-name"). The script trains a network with 8 hidden units using the data in the file "trains1" to train and the data in the file "tests1" for testing. To get more information on what the individual commands do, refer to the documentation of these. Execute the script. The last command is what actually does the work, and it may take about 30 seconds to execute. You can monitor its progress by using "log-last log". You can also view the network parameters at any time using "net-display log". (Note that you can kill the net-mc command and resume it later, simply by reissuing the command.) Our primary concern is to figure out whether the Markov Chain has sampled adequately from the posterior. To do this, we use the net-plt program to display various quantities from the log file: The first thing to check is that the rejection rate hasn't been too high. To plot the rejection rate, do "net-plt t r log | pt". We can look at the error on the training set using the command "net-plt t b log | pt -lny". We can also plot the value of individual network weights. For example, the weight from the first input to the first hidden unit may be plotted using "net-plt t w1@1 log | pt". You may be very surprised how much the value of the weight varies during the run! This however is not a bad sign; it simply indicates that sampling procedure is able to move around and explore weight space (although there is no guarantee that we are not trapped in a low-dimensional manifold). Perhaps more useful, we may also plot the values of the hyper-parameters. Since the hyper-parameters govern the behaviour of many weights they tend to move less rapidly than the weights, and it is sometimes valuable to view these parameters in order to figure out whether we have sampled for long enough. The ARD-parameters are plotted using "net-plt t h1@ log | pt -lny". To make predictions for the test cases we use the program called net-pred. To use 10 networks from iteration 21 and forwards to make predictions based on the mean of the predictive distribution, use "net-pred ran log 21:+10". To view the function that the network implements use "net-pred rbin log 21:+10 > res" to dump some numbers to a file. Then inside matlab, load the numbers using "load -ascii res". Look at the network function using "surf(reshape(res(:,3),65,65))", or the squared residuals using "surf(reshape(res(:,4),65,65))". Using the test-data to asses the adequacy of the samples obtained so far is naturally not an option for modelling real data, but it may help to give some insight into what it going on during sampling. Explain what may cause the answers obtained from the following pairs of commands: net-pred ran log 21:+25 and net-pred ran log 21:+50 net-pred ran log 21:60 and net-pred ran log 61:100 net-pred ran log 21:100+10 and net-pred ran log 91:100+10 Have we reached equilibrium? Have we sampled sufficiently from the equilibrium distribution? Are consequetive samples independent? How many samples do we need for predictions? Some of these questions are difficult! You may also plot the test-error for each of the networks in the log file using "net-plt t B log | pt -lny". How does the performance of the individual nets compare to the performance of the ensemble? Try other sizes of networks, eg with 4 and 2 and perhaps 16 hidden units. At what size do we start seeing the effect of under-fitting? At what size do we see over-fitting? -- Part II: Make some experiments with the 10-dimensional dataset, using train50, train100, train200 and test10000. You will need to modify the script in a few places to accomodate the larger input dimension. Compare the performace of net-mc to one of the contestants from the previous days exercises. Check the ARD parameters (see above). Is ARD able to recover the "active" inputs? Does the use of ARD facilitate the interpretation of what the model is doing? Is net-mc with ARD a black-box method?