197.facerec
SPEC CPU2000 Benchmark Description File
Benchmark Name
187.facerec
Benchmark Author
Jan C. Vorbrüggen ( [email protected] )
Benchmark Program General Category
Image Processing
Benchmark Description
This is an implementation of the face recognition system described in
M. Lades et al. (1993), IEEE Trans. Comp. 42(3):300-311.
In this application, an object - here, faces photographed frontally - are
represented as
labeled graphs. In the simplest case, used here, the graph is a regular
grid. To each vertex of the grid graph a set of features are attached; they
are computed from the
Gabor wavelet transform of the image and represent it in the
surroundings of a vertex. An edge of the graph is labeled with the vector
connecting its two vertices and represents the topographical relationship
of those vertices.
An object represented in this way can now be compared to a new image in a
process called
elastic graph matching. This is done by first determining the Gabor
wavelet transform for the new image. Then, for a given correspondance
between the graph's vertices and a set of image points, a function
taking into account both the similarity of the feature vectors at every
vertex and its corresponding image point, and the distortion of the graph
generated by the set of image points, measured as the change in the edge
labels, can be computed. This
graph similarity function is then the objective function of an
optimization process that varies the set of corresponding points in the
image. This optimization process is implemented in two steps: The
global move step keeps the graph rigid and moves it systematically
over all of the image, resulting in a placement that has the highest
similarity to the graph. This step can be considered as finding the object
(face) in the image. The local move step then takes this placement
as the starting position, and visits every vertex in random order. At each
vertex, the similarity function is evaluated on a small subgrid surrounding
the current position. (This is a small change from the algorithm
as originally published, where the trial moves at each node were random
as well.) If the similarity function's value is improved at one of
those positions, the change is made permanent; such a move is called a
hop. One round visiting each vertex position is called a
sweep. The local move step terminates when a sweep is completed
without a hop having been performed.
The benchmark consists of the following main phases:
-
Face Learning: The system has no prior knowledge of the class of
object it is supposed to recognize. It "learns" this by
extracting a canonic graph from one so-called canonic
image; that image and the position at which the graph is to be
extracted are specified by the user.
-
Graph Generation: For each of the images in the album gallery
(see Input Description, below), the Gabor wavelet transform is
computed, and the global move step is performed using the canonic graph.
The resultant graph is extracted from the transform and stored.
-
Recognition: For each of the images in the probe gallery (see
Input Description, below), the Gabor wavelet transform is
computed, and the global move step is performed using the canonic graph.
Then, a local move step is performed using each of the stored graphs. The
resultant vector of similarity values is searched for the maximal value;
the associated graph (and image) indicate the person recognized.
The parts that take the most computational time have the following
characteristics:
-
Gabor Wavelet Transform: The transform is performed by computing
the forward fast Fourier transform (FFT) of the image, multiplying it
with a number (here, 40) of kernels, computing the backward FFT for each
of the results, and inserting the absolute value for each pixel into a
two-dimensional array of feature vectors. This last step is similar to
performing a transpose of a large matrix, and stresses a processor's
memory subsystem. Finally, each feature vector is normalized. Run time is
proportional to the sum of the number of entries in the album and probe
galleries.
-
Global Move: This takes only a smallish part of the total run
time. It is dominated by the computation of the feature similarity
function, which basically is the scalar product of the two feature
vectors (one from the graph, one from the image transform). Again, run
time is proportional to the sum of the number of entries in the album and
probe galleries.
-
Local Move: The local move step is dominated by the computation
of the similarity function. In addition to the feature similarity
function described above, now also the distortion of the grid introduced
by the hops has to be taken into account. This is done incrementally,
i.e., the contribution of the vertex currently being considered to the
distortion is computed for both its old and its new position, which
entails handling the nine different positions a vertex can be in the
graph. The run time of this phase is proportional to the product of the
number of entries in the album and probe galleries.
The program allocates its memory on reading the run parameters (see below),
and makes use of it while generating the graphs. During each recognition
step, practically all of the code is exercised.
Input Description
The program first reads a set of text files that contain the parameter
settings governing the current run. It then reads a number of grey-level
images (256 by 256 pixels in size) of as many different persons' faces,
which constitute the album gallery to perform comparisons on, and
computes the labeled graphs representing these faces. It then reads another
set of images, the probe gallery, of the same persons as are
represented in the album gallery, but which differ in head pose, hair
style, addition or removal of glasses, and so on. For each of these, it
computes the Gabor wavelet transform, localizes the face using the global
move step, and performs the local move step for each graph from the album
gallery. This sequence of events is similar to what would be required when
a set of persons, of which a reference image each is given, are searched
for in a larger database, e.g., one extracted from a set of TV clips. This
form of an image database search is one typical use of the original
application.
For the test data set, the album and probe gallery each contain two images.
One of these images is the same as the canonic image mentioned above; the
result of this comparison is known a priori and is used as a consistency
check for the implementation.
For the training data set, the album gallery contains 13 images and
the probe gallery contains 26 images (two per person in the album
gallery).
For the reference data set, the album gallery contains 42 images and
the probe gallery contains 84 images (one to three per person in the
album gallery); these are all distinct from the images in the training data
set.
Output Description
On startup, the program first prints all parameters governing the run to
standard output. It then prints a line for each entry in the album gallery
with the name of the image file, the position of the labeled graph as
determined by the global move step, and its similarity with the canonic
graph. For each of the entries in the probe gallery, the same information
is output; this is followed by a line giving the name of the best match
from the album gallery and the similarity computed with the corresponding
graph. Finally, after all comparisons have been performed, a summary is
printed reporting the total number of comparisons, the number of correct
and incorrect comparisons, and the total number of hops and of sweeps
performed.
During the comparison process, for each entry in the probe gallery the
number of hops and of sweeps, seperately for the best matching entry and
for all entries in the album gallery accumulated, is reported to a
secondary output file named hops.dat. This file must satisfy
somewhat less stringent requirements for comparison to the reference output
than the primary report to standard output described above.
Of the 84 entries in the probe gallery of the reference data set,
80 entries (95%) result in the correct entry from the album gallery
being the best match.
Programming Language
FORTRAN 90
Known portability issues
None
Reference
M. Lades et al. (1993), Distortion Invariant Object Recognition in the
Dynamic Link Architecture, IEEE Trans. Comp. 42(3):300-311.
An extensive
set of pages with many figures that illustrate the components of the
system described above.
Last Updated: 16 August 2000