AI
November 14, 2004
This assignment gives you an opportunity to apply neural network learning to the problem of face recognition. You will experiment with a neural network program to train a sunglasses recognizer and a pose recognizer.
The image data can be found in /staff/sushil/classes/ml/code/book/nn/faces.
The image data can be found in /staff/sushil/classes/ml/code/book/nn/faces and as a tarred, gzipped file . You can also browse the directory This directory contains 20 subdirectories, one for each person, named by userid. Each of these directories contains several different face images of the same person.
You will be interested in the images with the following naming convention:
<userid>_<pose>_<expression>_<eyes>_<scale>.pgm
If you've been looking closely in the image directories, you may notice that some images have a .bad suffix rather than the .pgm suffix. As it turns out, 16 of the 640 images taken have glitches due to problems with the camera setup; these are the .bad images. Some people had more glitches than others, but everyone who got ``faced'' should have at least 28 good face images (out of the 32 variations possible, discounting scale).
To view the images, you can use the program xv or gimp. This are available as /usr/local/bin/xv and /usr/local/bin/gimp on our department's unix boxes. xv and gimp handle a variety of image formats, including the PGM format in which our face images are stored. While we won't go into detail about xv or gimp in this document, we will quickly describe the basics you need to know to use xv.
To start xv, just specify one or more images on the command line, like this:
xv /staff/sushil/classes/ml/code/book/nn/faces/glickman/glickman_straight_happy_open_4.pgm
This will bring up an X window displaying the face. Clicking the right button in the image window will toggle a control panel with a variety of buttons. The Dbl Size button doubles the displayed size of the image every time you click on it. This will be useful for viewing the quarter-resolution images, as you might imagine.
You can also obtain pixel values by holding down the left button while moving the pointer in the image window. A text bar will be displayed, showing you the image coordinates and brightness value where the pointer is located.
To quit xv, just click on the Quit button or type q in one of the xv windows.
Here are the basics for gimp. To start gimp type:
gimp /staff/sushil/classes/ml/code/book/nn/faces/glickman/glickman_straight_happy_open_4.pgm
You can increase the size of the image by increasing the size of the window then clicking on the right mouse button. This will popup a menu, select ``view'' then ``Zoom in'' to increase the size of the image.
To quit gimp you can right click on the picture, choose ``File'' then ``quit'' or choose ``File'' then ``quit'' on the gimp toolbar.
We're supplying C code for a three-layer fully-connected feedforward neural network which uses the backpropagation algorithm to tune its weights. To make life as easy as possible, we're also supplying you with an image package for accessing the face images, as well as the top-level program for training and testing, as a skeleton for you to modify. To help explore what the nets actually learn, you'll also find a utility program for visualizing hidden-unit weights as images.
The code is located in /staff/sushil/classes/ml/code/book/nn/src and as a tarred, gzipped file . Copy (or tar xvzf src.tgz ) Here is the code for browsing . Copy all of the files in this area to your own directory, and type make. Note: take care to use cp * instead of cp *.* in order to ensure that you get the Makefile. When the compilation is done, you should have one executable program: facetrain. Briefly, facetrain takes lists of image files as input, and uses these as training and test sets for a neural network. facetrain can be used for training and/or recognition, and it also has the capability to save networks to files.
The code has been compiled on the Department's linux boxes. If you wish to use the code on some other platform, feel free, but be aware that the code has only been compiled on these platforms.
Details of the routines, explanations of the source files, and related information can be found in Section 3 of this handout.
Turn in a short write-up of your answers to ALL the questions found in the following sequence of initial experiments.
cp /staff/sushil/classes/ml/code/book/nn/trainset/*.list.UNR .
facetrain -n shades.net -t straightrnd_train.list.UNR -1 straightrnd_test1.list.UNR
-2 straightrnd_test2.list.UNR -e 75
facetrain's arguments are described in Section 3.1.1, but a short description is in order here. shades.net is the name of the network file which will be saved when training is finished. straightrnd_train.list.UNR, straightrnd_test1.list.UNR, and straightrnd_test2.list.UNR are text files which specify the training set (70 examples) and two test sets (34 and 52 examples), respectively.
This command creates and trains your net on a randomly chosen sample of 70 of
the 156 ``straight'' images, and tests it on the remaining 34 and 52 randomly
chosen images, respectively. One way to think of this test strategy is that
roughly
of the images (straightrnd_test2.list.UNR) have
been held over for testing. The remaining
have been used for a
train and cross-validate strategy, in which
of these are being
used for as a training set (straightrnd_train.list.UNR) and
are being used for the validation set to decide when to halt
training (straightrnd_test1.list.UNR).
You might be wondering why you are only training on samples from a limited distribution (the ``straight'' images). The essential reason is training time. If you have access to a very fast machine, then you are welcome to do these experiments on the entire set (replace straight with all in the command above. Otherwise, stick to the ``straight'' images.
The difference between the straightrnd_*.list.UNR and the straighteven_*.list.UNR sets is that while the former divides the images purely randomly among the training and test sets, the latter ensures a relatively even distribution of each individual's images over the sets. Because we have only 7 or 8 ``straight'' images per individual, failure to distribute them evenly would result in testing our network the most on those faces on which it was trained the least.
facetrain -n pose.net -t all_train.list.UNR -1 all_test1.list.UNR
-2 all_test2.list.UNR -e 100
Since the pose-recognizing network should have substantially fewer weights to update, even those of you with slow machines can get in on the fun of using all of the images. In this case, 260 examples are in the training set, 140 examples are in test1, and 193 are in test2.
hidtopgm pose.net image-filename 32 30 n
Invoking xv on the image image-filename should then display the range of weights, with the lowest weights mapped to pixel values of zero, and the highest mapped to 255. If the images just look like noise, try retraining using facetrain_init0 (compile with make facetrain_init0), which initializes the hidden unit weights of a new network to zero, rather than random values.
The code for this assignment is broken into several modules:
Although you'll only need to modify code in imagenet.c and facetrain.c, feel free to modify anything you want in any of the files if it makes your life easier or if it allows you to do a nifty experiment.
facetrain has several options which can be specified on the command line. This section briefly describes how each option works. A very short summary of this information can be obtained by running facetrain with no arguments.
When you run facetrain, it will first read in all the data files and print a bunch of lines regarding these operations. Once all the data is loaded, it will begin training. At this point, the network's training and test set performance is outlined in one line per epoch. For each epoch, the following performance measures are output:
<epoch> <delta> <trainperf> <trainerr> <t1perf> <t1err> <t2perf> <t2err>
These values have the following meanings:
Although you do not have to modify the image or network packages, you will need to know a little bit about the routines and data structures in them, so that you can easily implement new output encodings for your networks. The following sections describe each of the packages in a little more detail. You can look at imagenet.c, facetrain.c, and facerec.c to see how the routines are actually used.
In fact, it is probably a good idea to look over facetrain.c first, to see how the training process works. You will notice that load_target() from imagenet.c is called to set up the target vector for training. You will also notice the routines which evaluate performance and compute error statistics, performance_on_imagelist() and evaluate_performance(). The first routine iterates through a set of images, computing the average error on these images, and the second routine computes the error and accuracy on a single image.
You will almost certainly not need to use all of the information in the following sections, so don't feel like you need to know everything the packages do. You should view these sections as reference guides for the packages, should you need information on data structures and routines.
Another fun thing to do, if you didn't already try it in the last question of the assignment, is to use the image package to view the weights on connections in graphical form; you will find routines for creating and writing images, if you want to play around with visualizing your network weights.
Finally, the point of this assignment is for you to obtain first-hand experience in working with neural networks; it is not intended as an exercise in C hacking. An effort has been made to keep the image package and neural network package as simple as possible. If you need clarifications about how the routines work, don't hesitate to ask.
As mentioned earlier, this package implements three-layer fully-connected feedforward neural networks, using a backpropagation weight tuning method. We begin with a brief description of the data structure, a BPNN (BackPropNeuralNet).
All unit values and weight values are stored as doubles in a BPNN.
Given a BPNN *net, you can get the number of input, hidden, and output units with net->input_n, net->hidden_n, and net->output_n, respectively.
Units are all indexed from
to
,
where
is the number of units in the layer. To get the value
of the kth unit in the input, hidden, or output layer, use
net->input_units[k], net->hidden_units[k], or
net->output_units[k], respectively.
The target vector is assumed to have the same number of values as the number of units in the output layer, and it can be accessed via net->target. The kth target value can be accessed by net->target[k].
To get the value of the weight connecting the ith input unit to the jth hidden unit, use net->input_weights[i][j]. To get the value of the weight connecting the jth hidden unit to the kth output unit, use net->hidden_weights[j][k].
The routines are as follows:
This routine initializes the neural network package. It should be called before any other routines in the package are used. Currently, its sole purpose in life is to initialize the random number generator with the input seed.
Creates a new network with n_in input units, n_hidden hidden
units, and n_output output units. All weights in the network
are randomly initialized to values in the range
. Returns
a pointer to the network structure. Returns NULL if the routine
fails.
Takes a pointer to a network, and frees all memory associated with the network.
Given a pointer to a network, runs one pass of the backpropagation algorithm.
Assumes that the input units and target layer have been properly set up.
learning_rate and momentum are assumed to be values between
and
. erro and errh are pointers to doubles, which
are set to the sum of the
error values on the output units
and hidden units, respectively.
Given a pointer to a network, runs the network on its current input values.
Given a filename, allocates space for a network, initializes it with the weights stored in the network file, and returns a pointer to this new BPNN. Returns NULL on failure.
Given a pointer to a network and a filename, saves the network to that file.
The image package provides a set of routines for manipulating PGM images. An image is a rectangular grid of pixels; each pixel has an integer value ranging from 0 to 255. Images are indexed by rows and columns; row 0 is the top row of the image, column 0 is the left column of the image.
Opens the image given by filename, loads it into a new IMAGE data structure, and returns a pointer to this new structure. Returns NULL on failure.
Creates an image in memory, with the given filename, of dimensions
nrows
ncols, and returns a pointer to this image.
All pixels are initialized to 0. Returns NULL on failure.
Given a pointer to an image, returns the number of rows the image has.
Given a pointer to an image, returns the number of columns the image has.
Given a pointer to an image, returns a pointer to its base filename (i.e., if the full filename is /usr/joe/stuff/foo.pgm, a pointer to the string foo.pgm will be returned).
Given a pointer to an image and row/column coordinates, this routine returns the value of the pixel at those coordinates in the image.
Given a pointer to an image and row/column coordinates, and an integer
value
assumed to be in the range
, this routine sets the pixel
at those coordinates in the image to the given value.
Given a pointer to an image and a filename, writes the image to disk with the given filename. Returns 1 on success, 0 on failure.
Given a pointer to an image, deallocates all of its associated memory.
Returns a pointer to a new IMAGELIST structure, which is really just an array of pointers to images. Given an IMAGELIST *il, il->n is the number of images in the list. il->list[k] is the pointer to the kth image in the list.
Given a pointer to an imagelist and a pointer to an image, adds the image at the end of the imagelist.
Given a pointer to an imagelist, frees it. Note that this does not free any images to which the list points.
Takes a pointer to an imagelist and a filename. filename is assumed to specify a file which is a list of pathnames of images, one to a line. Each image file in this list is loaded into memory and added to the imagelist il.
hidtopgm takes the following fixed set of arguments:
hidtopgm net-file image-file x y n
outtopgm takes the following fixed set of arguments:
outtopgm net-file image-file x y n
outtopgm pose.net pose-out2.pgm 4 1 2
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 1 nn
The translation was initiated by Sushil Louis on 2004-11-14