-
华南理工大学
《模式识别》
复习资料
CH1.
【
Pattern Recognition
Systems
】
Data Acquisition & Sensing
Measurements of physical variables
Pre-processing
Removal of noise in data
Isolation of patterns of interest from
the background
(segmentation)
Feature extraction
Finding a new
representation in terms of features
Model learning / estimation
Learning a mapping between features and
pattern groups
and categories
Classification
Using features and learned models to
assign a pattern to a
category
Post-processing
Evaluation of confidence in decisions
Exploitation of context to
improve performance
Combination of experts
【
Design
Cycle
】
【
Learning
strategies
】
Supervised learning
A
teacher
provides
a
category
label
or
cost
for
each
pattern in the training
set
Unsupervised learning
The
systems
forms
clusters
or
natural
grouping
of
the
input
patterns
Reinforcement
learning
No
desired
category
is
given
but
the
teacher
provides
feedback
to
the
system
such
as
the
decision
is
right
or
wrong
【
Evaluation
methods
】
Independent Run
A
statistical
method,
also
called
Bootstrap.
Repeat
the
experiment
the result.
Cross-validation
Dataset
D
is
randomly
divided
into
n
disjoint
sets
D
i
of
equal size n/m, where n is the number
of samples in D
i
.
Classifier is trained m times and each
time with different
set held out as a
testing set
CH2.
【
Bayes
formula
】
【
Bayes Decision
Rule
】
Multivariate Normal Density in d
dimensions:
【
Maximum Likelihood (ML)
Rule
】
When p(w
1
)=p(w
2
),
the decision is based entirely on the
likelihood
p(x|w
j
) -->
p(x|w)
∝
p(x|w)
【
ML Parameter
Estimation
】
【
Error
analysis
】
Probability of error for multi-class
problems:
【
Discriminant
function
】
Error =
Bayes Error + Added Error:
【
Decision
boundary
】
【
Lost
function
】
Conditional risk
(expected
loss of taking action ai):
CH3.
【
Normalized distance from
origin to surface
】
Overall risk
(expected loss):
zero-one
loss function is used to minimize the error rate
【
Distance of
arbitrary point to surface
】
【
Minimum Risk
Decision Rule
】
【
Perceptron Criterion
】
【
Normal
Distribution
】
【
Pseudoinverse
Method
】
Problem:
[[Exercise for Pseudoinverse Method]]
(2)
【
Least-Mean-
Squared (Gradient Descent)
】
【
Linear classifier for
multiple Classes
】
【
linearly separable
problem
】
A
problem whose data of different classes can be
separated exactly by linear decision
surface.
CH4.
【
Perception update
rule
】
(reward and punishment schemes)
[[Exercise for perception
]]
All
pattern
are
presented
to
the
network
before
learning
takes place
【
Error of Back-Propagation
Algorithm
】
【
Regularization
】
Update rule for weight:
【
Weight of Back-Propagation
Algorithm
】
The learning rule for the hidden-to-
output units :
The learning
rule for the input-to-hidden units:
Summary:
【
Training of Back-
Propagation
】
Weights
can
be
updated
differently
by
presenting
the
training samples in different
sequences.
Two popular
training methods:
Stochastic Training
Patterns
are
chosen
randomly
form
the
training
set
(Network weights are
updated randomly)
Batch training
【
Problem of training a
NN
】
Scaling input
Target values
Number of hidden layers
3-layer is recommended. Special
problem: more than 3
Number of hidden
units
roughly n/10
Initializing weights
Weight decay
Stochastic and batch training
Stopped training
When
the
error
on
a
separate
validation
set
reaches
a
minimum
[[Exercise for ANN ]]
forward
pass
:
g=0.8385
reverse pass: (learning rate=0.5)
CH5.
【
Structure of
RBF
】
3 layers:
Input layer:
f(x)=x
Hidden layer:
Gaussian function
Output layer: linear
weight sum
【
Characteristic of
RBF
】
Advantage:
RBF network trains faster than MLP
The hidden layer is easier
to interpret than MLP
Disadvantage:
During
the
testing,
the
calculation
speed
of
a
neuron
in
RBF is slower than MLP
[[Exercise for
RBF ]]
Solution:
===>
CH6.
【
Margin
】
*
Margin
is defined as the width that the
boundary could
be increased by before
hitting a data point
*
The
linear
discriminant
function
(classifier)
with
the
maximum
margin
is the best.
* Data
closest to the hyper plane are
support
vectors.
【
Maximum Margin
Classification
】
*
Maximizing the margin is good according to
intuition and
theory.
*
Implies
that
only
support
vectors
are
important;
other
training examples are
ignorable.
Advantage:
(compare to LMS and perception)
Better generalization
ability & less over-fitting