Zehang Sun - Gender Classification Project, UNR

Genetic Feature Subset Selection For Gender Classification

Zehang Sun
Department of Computer Science, UNR

Advisors:
Dr. George Bebis

The Project:
Main
Overview
Methodology
Results
Future work
Publications
Acknowledgement

Links:
UNR-CVL
UNR-Home Page

The proposed approach consists of two steps. First, facial images are represented in a low dimensional space, spanned by the eigenvectors of the covariance matrix of the data, computed using PCA. Then, for each of the classifiers (Bayes classifier, Neural Network, Linear Discriminant Analysis and Support Vector Machines) , a GA is used select gender-related features automatically to reduce error rate. It has been found in several studies that different eigenvectors encode different kind of information. For example, the first few eigenvectors seem to encode lighting while other eigenvectors seem to encode features such as glasses or moustaches . For example, Fig.1 shows some of the eigenvectors computed from our training data. Obviously, eigenvectors 1-4 encode light variations while eigenvectors 10 and 20 encode information about glasses.

Fig.1. Eigenvectors (from left to right and from top to bottom): No. 1-6, 8, 10, 12, 14, 19, 20, 150, 200 and 250.

Genetic Feature Selection

In our encoding scheme, the chromosome is a bit string whose length is determined by the number of eigenvectors. Each eigenvector is associated with one bit in the string. If the ith bit is 1, then the ith eigenvector is selected, otherwise, that component is ignored. Each chromosome thus represents a different eigen-feature subset.

The goal of feature subset selection is to use fewer features to achieve the same or better performance. Therefore, the fitness evaluation contains two terms: (i) accuracy from the validation data and (ii) number of features used. Combining these two terms, the fitness function is given as:

fitness=10^4*Accuracy +0.4*Zeros

where Accuracy is the accuracy rate that an individual achieves, and Zeros is the number of zeros in the chromosome.

Dataset

The dataset used contained 400 frontal images from 400 distinct people, representing different races, with different facial expressions, and under different lighting conditions. The 400 images were equally divided between males and females.