Javier Martinez - Face Tracking

Proposed approach

Birchfield's paper proposes a two module function that is evaluated in a given window to find the best ellipse that contains the head. As mentioned in the introduction, one of the modules deals with color inside the ellipse while the other deals with the intensity gradient across the ellipse boundary. The idea is to maximize a function inside the window to locate the optimum. The search space is composed of three parameters; x, y and σ. The first two parameters correspond to pixel coordinate x and y. The last parameter, σ, measures the width of the minor axis of the ellipse. The ratio between the mayor and minor axes in the ellipse is fixed at 1.2. The equation to maximize is:

(1)

Birchfield uses a search space or window (S) of 4x4x1, that is; ±4 pixels in the x direction, ±4 pixels in the y direction and ±1 pixel wide. The paper also mentions a window of size 8x8x1. The later one was found to give better results.

The functions Φ_g(s) and Φ_c(s) are the gradient and color module respectively and they are defined below.

(2)

(3)

The bar at the top of of the functions in (1) is to designate the normalized version of the equations to make both modules equally important (same weight). To normalize we subtract the minimum value found in the window and divide by the range as shown in equations (4) and (5) below.

(4)

(5)

The gradient module sums the dot product of the normal vector with its gradient (calculated using Sobel edge detector) along all contour pixels. To prevent larger ellipses from having a higher value we scale the sum with the number of pixels in the contour N_θ. The color module compares the histogram at ellipse location s with a model histogram. This is called histogram intersection and is calculated by summing the value of the lesser bin between the model histogram and the image histogram for all the bins in the histogram. The paper fails to mention that this only works if the histograms are normalized, that is they are previously divided by the total number of pixels in the ellipse (or the sum of all the bins). If histogram normalization is not done then the module will be biassed towards ellipses smaller than the one used to obtain the model histogram. This can be seen in two ways: first, a smaller ellipse will make the bins in the image histogram have a lesser value on average making min function return I(i). Second, the denominator will decrease for smaller ellipses making the ratio higher. The modified color module equation is then:

(6)

(7)

Analogously for the model histogram:

(8)

Another detail that was changed from the original approach is the use of Hue-Saturation-Intensity as the histogram color space. The reason was that the paper was not very explicit on how to perform the transformation from RGB to their chose of color space. HSI offers the same advantage which is the separation of chroma information from intensity information.