First Steps in Neural Networks
Mark W. Andrews
I have put together some MATLAB programs for training artificial
neural networks on various toy problems. The files include:
- perceptron.m
- - a MATLAB function implementing a perceptron neural
network.
- mlp.m
- - a function implementing a multi-layer perceptron.
- perceptron_xor.m
- - a MATLAB program for setting up and training a
perceptron on two simple problems (i.e. learning the Boolean functions
inclusive or and exclusive or).
- mlp_xor.m
- - a MATLAB program for setting up and training a
multi-layer perceptron on two simple problems (i.e. learning the Boolean functions
inclusive or and exclusive or).
These programs were designed to strike a balance between ease of use
and flexibility. They could serve as a template if you wish to
program up your own networks. But if you don't have a stomach for programming
they should still be easy to use for the exercises below.
To carry out the exercises you will need to mess with only two
programs, perceptron_xor and mlp_xor. These programs
can be invoked by typing their names at the MATLAB command line. For
example, if you type perceptron_xor at the command line, MATLAB
will run the program perceptron_xor.m. It will set up a
network, choose some random initial weights, train the network to
learn one or the other Boolean function and spit
out the error of the network as it learns. If and when it learns, the
program will spit out a graph showing a learning curve, and a few
other descriptions of its performance. The same general process occurs
with the program mlp_xor.m.
For the exercises, you will be required to "pop the hood" and change a
few parameters in the code. These parameters are:
- lrate
- - this is the learning rate. It is a numerical variable
specifying the proportion of the error derivative by which the weights
will be adjusted during training. Basically, it is a variable that
affects how the networks learns from its errors. The default is .1. Change it as you wish.
- stp
- - the stopping criterion for the network. This is a
numerical variable that specifies the value of the mean squared error
at which the network can stop. It is set as a default to .01. In other
words, if the network's average squared error is less than .01, it
will stop training. You can change this to whatever value you like.
- H_space
- - the number of hidden units in the multi-layer
perceptron. The default is 2.
- T
- - this is an array specifying the training data. There are two
arrays. One specifies the training data for the inclusive or
function, the other specifies the training data for the exclusive or
function. Comment out with a "%"
the training array you do not want to use.
- Train a perceptron to learn the inclusive or function:
- Train the perceptron at least five times (i.e., with five different
initial weight configurations). On average, how many iterations are
necessary for the network to reach the stopping criterion?
- Describe the general shape of the learning curve over the different
training sessions.
- Lower the stopping criteria to successively stricter criteria (e.g.,
.01, .001, .0001). Describe the general shape of the learning curve
for each case.
- Train a perceptron to learn the exclusive or function:
- Train the perceptron to learn this function multiple times. Does it
ever reach the stopping criterion? The network times-out after
10,000 iterations.
- What is the general shape of the learning curve over the different
training sessions?
- Put on your thinking caps and ask why the perceptron acts as it
does with the exclusive or function. If you bash your head
against this problem enough times you should find the answer. It is
quite simple.
- Train a multi-layer perceptron to learn the exclusive or
function.
- Train a multi-layer perceptron to learn the exclusive or
function about ten times. On average, how many iterations are
necessary for the network to reach the stopping criterion?
- What is the general shape of the learning curve over the different
training sessions?
- Adjust the number of hidden units in the multi-layer perceptron
(e.g. 1 unit, 3 units, 5 units), and retrain multiple times. Describe
the behavior of the network (i.e. the average number of iterations to
reach the criterion, and the general shape of the learning curve)
across the different conditions.
- Adjust the learning rate to higher and lower values (e.g., 1, .5, .1,
.01). Describe
the behavior of the network (i.e. the average number of iterations to
reach the criterion, and the general shape of the learning curve)
across the different conditions.
- Put on your thinking caps again and ask why the multi-layer perceptron
acts as it does with the exclusive or function. That is to say,
explain why the multi-layer perceptron differs in performance from the single
layer perceptron. The answer to this question is not at all
simple but there really is an answer. If you get your thrills
from attempting to solve hard mathematical questions, then this
problem is for you. However, if you've got better things to do, you
can just make a guess and hope for the best.
Write out your answers for each question in the order they are asked.
With all questions, try to explain your answers as explicitly as
possible. It is important to remember that there is no magic and no
ghosts in a neural network. Everything happens for a reason and can be
explained mathematically. When describing the behavior of the network
it is always possible to be explicit about why it performs as it
does. For example, with the learning curves there is a reason why they
are shaped as they are. Try to explain this and other issues in your
answers.
MarkAndrews
2001-03-13