-
4 Two-dimensional Face
Recognition
4.1 Feature
Localization
Before discussing the
methods of comparing two facial images we now take
a
brief
look
at
some
at
the
preliminary
processes
of
facial
feature
alignment.
This
process
typically
consists
of
two
stages:
face
detection
and
eye
localization.
Depending on
the application, if the position of the face
within the image is known
beforehand
(for a cooperative subject in a door access system
for example) then
the face detection
stage can often be skipped, as the region of
interest is already
known. Therefore,
we discuss eye localization here, with a brief
discussion of face
detection in the
literature review .
The eye
localization method is used to align the 2D face
images of the various
test sets used
throughout this section. However, to ensure that
all results presented
are
representative
of
the
face
recognition
accuracy
and
not
a
product
of
the
performance
of
the
eye
localization
routine,
all
image
alignments
are
manually
checked and any
errors corrected, prior to testing and evaluation.
We
detect the
position
of
the
eyes
within
an
image
using
a
simple
template
based method. A training set of
manually pre-aligned images of faces is taken, and
each image cropped to an area around
both eyes. The average image is calculated
and used as a template.
Figure 4-1 The average
eyes. Used as a template for eye detection.
Both eyes are included in a
single template, rather than individually
searching
for each eye in turn, as the
characteristic symmetry of the eyes either side of
the
nose, provide a useful feature that
helps distinguish between the eyes and other
false positives that may be picked up
in the background. Although this method is
highly
susceptible
to
scale
(i.e.
subject
distance
from
the
camera)
and
also
introduces
the
assumption
that
eyes
in
the
image
appear
near
horizontal.
Some
preliminary experimentation also
reveals that it is advantageous to include the
area
of skin just beneath the eyes. The
reason being that in some cases the eyebrows
。
can closely
match the template, particularly if there are
shadows in the eye-sockets,
but the
area of skin below the eyes helps to distinguish
the eyes from eyebrows
(the area just
below the eyebrows contain eyes, whereas the area
below the eyes
contains only plain
skin).
A window is passed over the test
images and the absolute difference taken to
that of the average eye image shown
above. The area of the image with the lowest
difference is taken as the region of
interest containing the eyes. Applying the same
procedure using a smaller template of
the individual left and right eyes then refines
each eye position.
This
basic template-based method of eye localization,
although providing fairly
precise
localizations,
often
fails
to
locate
the
eyes
completely.
However,
we
are
able to
improve performance by including a weighting
scheme.
Eye
localization
is
performed
on
the
set
of
training
images,
which
is
then
separated into two sets: those in which
eye detection was successful; and those in
which eye detection failed. Taking the
set of successful localizations we compute
the average distance from the eye
template (Figure 4-2 top). Note that the image is
quite dark, indicating that the
detected eyes correlate closely to the eye
template,
as we would expect. However,
bright points do occur near the whites of the eye,
suggesting that this area is often
inconsistent, varying greatly from the average eye
template.
Figure 4-2
–
Distance to the eye
template for successful detections (top)
indicating variance
due to noise and
failed detections (bottom) showing credible
variance due to miss-detected
features.
In
the
lower
image
(Figure
4-2
bottom),
we
have
taken
the
set
of
failed
localizations(images
of
the
forehead,
nose,
cheeks,
background
etc.
falsely
detected
by
the
localization
routine)
and
once
again
computed
the
average
distance
from
the
eye
template.
The
bright
pupils
surrounded
by
darker
areas
indicate
that
a
failed
match
is
often
due
to
the
high
correlation
of
the
nose
and
cheekbone
regions
overwhelming
the
poorly
correlated
pupils.
Wanting
to
emphasize
the
difference
of
the
pupil
regions
for
these
failed
matches
and
minimize the variance of the whites of
the eyes for successful matches, we divide
-
可编辑修改
-
。
the lower image
values by the upper image to produce a weights
vector as shown
in Figure 4-3. When
applied to the difference image before summing a
total error,
this weighting scheme
provides a much improved detection rate.
Figure 4-3 -
Eye template weights used to give higher priority
to those pixels that best represent
the
eyes.
4.2 The Direct Correlation
Approach
We
begin
our
investigation
into
face
recognition
with
perhaps
the
simplest
approach,
known
as
the
direct
correlation
method
(also
referred
to
as
template
matching by
Brunelli and Poggio) involving the direct
comparison of pixel intensity
values
taken
from
facial
images.
We
use
the
term
‘
Direct
Correlation
’
to
encompass all techniques
in which face images are compared directly,
without any
form of image space
analysis, weighting schemes or feature extraction,
regardless
of the distance metric used.
Therefore, we do not infer that
Pearson
’
s correlation is
applied as the similarity function
(although such an approach would obviously come
under our definition of direct
correlation). We typically use the Euclidean
distance
as our metric in these
investigations (inversely related to
Pearson
’
s correlation and
can be considered as a scale and
translation sensitive form of image correlation),
as
this
persists
with
the
contrast
made
between
image
space
and
subspace
approaches in later
sections.
Firstly, all facial images
must be aligned such that the eye centers are
located
at
two
specified
pixel
coordinates
and
the
image
cropped
to
remove
any
background information. These images
are stored as grayscale bitmaps of 65 by
82 pixels and prior to recognition
converted into a vector of 5330 elements (each
element
containing
the
corresponding
pixel
intensity
value).
Each
corresponding
vector
can
be
thought
of
as
describing
a
point
within
a
5330
dimensional
image
space. This simple principle can easily
be extended to much larger images: a 256
by 256 pixel image occupies a single
point in 65,536-dimensional image space and
again, similar images occupy close
points within that space. Likewise, similar faces
are
located
close
together
within
the
image
space,
while
dissimilar
faces
are
spaced
far apart. Calculating the Euclidean distance
d
, between two facial image
vectors (often referred to as the query
image
q
, and gallery image
g
), we get an
-
可编辑修改
-