First Steps in Neural Networks

Mark W. Andrews




Introduction

I have put together some MATLAB programs for training artificial neural networks on various toy problems. The files include:

perceptron.m
- a MATLAB function implementing a perceptron neural network.
mlp.m
- a function implementing a multi-layer perceptron.
perceptron_xor.m
- a MATLAB program for setting up and training a perceptron on two simple problems (i.e. learning the Boolean functions inclusive or and exclusive or).
mlp_xor.m
- a MATLAB program for setting up and training a multi-layer perceptron on two simple problems (i.e. learning the Boolean functions inclusive or and exclusive or).

These programs were designed to strike a balance between ease of use and flexibility. They could serve as a template if you wish to program up your own networks. But if you don't have a stomach for programming they should still be easy to use for the exercises below.


General Instructions

To carry out the exercises you will need to mess with only two programs, perceptron_xor and mlp_xor. These programs can be invoked by typing their names at the MATLAB command line. For example, if you type perceptron_xor at the command line, MATLAB will run the program perceptron_xor.m. It will set up a network, choose some random initial weights, train the network to learn one or the other Boolean function and spit out the error of the network as it learns. If and when it learns, the program will spit out a graph showing a learning curve, and a few other descriptions of its performance. The same general process occurs with the program mlp_xor.m. For the exercises, you will be required to "pop the hood" and change a few parameters in the code. These parameters are:

lrate
- this is the learning rate. It is a numerical variable specifying the proportion of the error derivative by which the weights will be adjusted during training. Basically, it is a variable that affects how the networks learns from its errors. The default is .1. Change it as you wish.
stp
- the stopping criterion for the network. This is a numerical variable that specifies the value of the mean squared error at which the network can stop. It is set as a default to .01. In other words, if the network's average squared error is less than .01, it will stop training. You can change this to whatever value you like.
H_space
- the number of hidden units in the multi-layer perceptron. The default is 2.
T
- this is an array specifying the training data. There are two arrays. One specifies the training data for the inclusive or function, the other specifies the training data for the exclusive or function. Comment out with a "%" the training array you do not want to use.

Exercises

  1. Train a perceptron to learn the inclusive or function:

    1. Train the perceptron at least five times (i.e., with five different initial weight configurations). On average, how many iterations are necessary for the network to reach the stopping criterion?
    2. Describe the general shape of the learning curve over the different training sessions.
    3. Lower the stopping criteria to successively stricter criteria (e.g., .01, .001, .0001). Describe the general shape of the learning curve for each case.

  2. Train a perceptron to learn the exclusive or function:

    1. Train the perceptron to learn this function multiple times. Does it ever reach the stopping criterion? The network times-out after 10,000 iterations.
    2. What is the general shape of the learning curve over the different training sessions?
    3. Put on your thinking caps and ask why the perceptron acts as it does with the exclusive or function. If you bash your head against this problem enough times you should find the answer. It is quite simple.

  3. Train a multi-layer perceptron to learn the exclusive or function.

    1. Train a multi-layer perceptron to learn the exclusive or function about ten times. On average, how many iterations are necessary for the network to reach the stopping criterion?
    2. What is the general shape of the learning curve over the different training sessions?
    3. Adjust the number of hidden units in the multi-layer perceptron (e.g. 1 unit, 3 units, 5 units), and retrain multiple times. Describe the behavior of the network (i.e. the average number of iterations to reach the criterion, and the general shape of the learning curve) across the different conditions.
    4. Adjust the learning rate to higher and lower values (e.g., 1, .5, .1, .01). Describe the behavior of the network (i.e. the average number of iterations to reach the criterion, and the general shape of the learning curve) across the different conditions.
    5. Put on your thinking caps again and ask why the multi-layer perceptron acts as it does with the exclusive or function. That is to say, explain why the multi-layer perceptron differs in performance from the single layer perceptron. The answer to this question is not at all simple but there really is an answer. If you get your thrills from attempting to solve hard mathematical questions, then this problem is for you. However, if you've got better things to do, you can just make a guess and hope for the best.


Reporting your answers

Write out your answers for each question in the order they are asked. With all questions, try to explain your answers as explicitly as possible. It is important to remember that there is no magic and no ghosts in a neural network. Everything happens for a reason and can be explained mathematically. When describing the behavior of the network it is always possible to be explicit about why it performs as it does. For example, with the learning curves there is a reason why they are shaped as they are. Try to explain this and other issues in your answers.


MarkAndrews 2001-03-13