ronald-joyful
精品文
附录:中英文翻译
15
S p eech
S ig n a l
P r ocessi ng
15 .3
Ana l y si s
a nd
S y n the sis
J esse W. Fussell
A fte r an acousti c spee ch s i gnal i s conve rte d to an ele ctri cal si gnal by a mi crophone, i t m ay be desi rable
to
anal yze
the
ele ctri cal
si gnal
to e stim a
te
some
time -v ary i ng
paramete rs
whi ch
provide
i nform ati on
about
a
model of the speech producti on me chanism.
S peech a na ly sis
i s the process of e stim ati ng such paramete rs. S
imil arl y , g ive n some parametri c model of spee ch production and a se que nce of param eters for that m ode l,
speech
sy n thesi s
is
the
proce ss
of
cre ating
an ele ctri cal
s
i gnal
w hi ch
approxim ate s
spe ech.
W hile
anal y si s
and
sy nthesis
te chni ques
m ay
be
done
eithe r
on
the
continuous
si gnal
or
on
a
sam pled
ve rsi on
of the
si gnal,
most
mode rn anal y sis and sy nthesis methods are base d on di gital si gnal
processing.
A
ty pi cal
spee ch
production
m odel
is
shown
in
Fi g
.
15 .6
.
In this
m odel
the
output
of the
ex citati on
function is
scale d
by
the
gai n
paramete r
and
then
filtere d
to
produce
spee ch.
A ll
of
these
functions
are
ti me -v ary ing.
F IGUR E 15 .6 A ge ne ra l spee ch production model.
下载后可复制编辑
精品文
F IGUR E 1 5 .7 W ave form of a spoken phone me /i/ as i n beet.
For m any
models,
the
parame ters
are
v arie d
at
a
pe riodi c
rate,
ty pi call y
5 0
to
100
time s
pe r se cond. M ost
spee ch inform ati on is containe d i n the porti on of the si gnal bel ow about 4 k Hz.
The
ex citati on is
usuall y modele d
as
e ithe r
a
mi xture
or
a
choi ce
of
random
noise
and
periodi c
w aveform .
For hum an spee ch, v oi ced e x citati on occurs w hen the vocal fol ds in the lary nx vibrate; unvoi ce d e x citati on
occurs at constri cti ons i n the vocal tract w hi ch cre ate turbulent a i r fl ow [Fl anagan, 1965] . The rel ati ve mi x of
these tw o ty pe s of ex citati on is terme d
“
v
oicing
.”In
addition, the periodi c e xcitation i s characte rized by a
fundame ntal
fre quency ,
te rmed
p itch
or
F0 .
The
ex citati on is
scaled
by
a
factor
de si gne d
to produce
the
prope r
ampli tude or level of the spee ch si gnal . The scaled ex citati on function i s then fi ltere d to produce the prope r
spe ctral characte risti cs. W hile the filter m ay be nonli near, i t i s usuall y m odele d as a li near
function.
An a l y sis
of
Excit a t
ion
In
a
si m plified
form,
the
ex citati on
function
m ay
be
consi dere d to
be
purel y
pe ri odi c,
for
v oi ced
speech,
or
purel y random, for unvoi ce d. T hese tw o states correspond to voi ce d phoneti c cl asse s such as vow els
and
nasals
and
unvoi ce d
sounds
such
as
unvoi ce d
fri catives.
This
binary
voi ci ng
m odel
is
an
ove rsi mplifi cation
for
sounds such as v oi ced fri cati ves, whi ch consist of a mi xture of peri odi c and random compone nts.
Fi gure 1
5.7
is an ex ample of a time w ave form of a spoke n /i/ phoneme , w hi ch is w ell m odeled by onl y pe riodi c e x citati on.
B oth ti me dom ai n and frequency dom ai n anal y s is te chni ques have bee n used to esti m ate the de gree
of
voi ci ng for a short se gme nt or frame of spee ch. One ti me dom ain fe ature, te rme d the ze ro crossing rate, i s the
numbe r
of ti mes
the
si gnal
changes
si gn
in
a
short i nte rv al .
As
show n i n Fi g
.
1 5.
7
,
the
z ero
crossing
rate
for
voi ce d sounds
i s
rel ati vel y
low .
S i nce
unvoi ce d spee ch
ty pi call y
has
a
la
rger
proportion
of hi gh-frequency
energy than voi ce d spee ch, the ratio of high-fre que ncy to low -frequency e nergy is a fre que ncy dom ai n
techni que that provi des i nform ation on voi ci
ng.
下载后可复制编辑
精品文
A nothe r measure use d to estim ate the de gree of voi ci ng is the autocorrel ation functi on, w hi ch is de fine d
for
a sam pled speech se gment, S , as
w here
s(
n)
is
the
val ue
of the
nth
sam ple
w i
t
hi n the
se gme nt
of le ngth
N.
S ince
the
autocorrel ati on
function
of
a periodi c functi on is i tsel f pe ri odi c, voi ci ng can be e sti mated from the de gree of pe ri odi city of
the
autocorrel ati on function.
Fi gure 15. 8
i s a graph of the nonne gati ve te rms of the autocorrel ation functi on for a
64 -ms frame of the w aveform of Fi g . 15. 7. Ex cept for the de cre ase i n amplitude w ith i ncre asi ng lag, whi ch
results
from
the
re ctangul ar
wi ndow
functi on
w hi ch
delim its
the
se gment,
the
autocorre lati on
function i s
see n
to be quite pe riodi c for thi s voi ce d utterance.
F IGUR E 1 5 .8 A utocorrel ati on functi on of one frame of /i /.
If
an
anal y sis
of the
voicing
of
the
spee ch
si gnal
i ndi cate s
a
voice d
or pe ri odi c
com ponent is
prese nt,
another ste p i n the anal y si s process m ay be to estim ate the freque ncy ( or pe ri od) of the voi ce d com ponent.
There
are
a
num ber
of
w ay s
in
whi ch this
m ay
be
done.
One
is
to
me asure
the
ti me
l apse
between
pe aks
i n the
time dom ai n si gnal. For ex am ple i n Fi g . 15.7 the m aj or peaks are separate d by about 0. 00 71 s, for
a
fundame ntal
fre quency
of
about
1 41
Hz. Note,
it
w oul d
be
quite
possible
to e rr i n the
e stim ate
of
fundame ntal
fre quency by mistaki ng the sm aller pe aks that occur betwee n the m a jor pe aks for the m aj or pe aks.
These
sm alle r
pe aks
are
produced
by
resonance
i n the
v ocal
tract
w hi ch,
i n this
e x ample ,
happen to
be
at
about
twi ce
the ex ci tation
fre quency . T his ty pe of e rror w ould re sult in an e sti m ate of pitch approxi m atel y tw i ce the corre ct fre quency
.
下载后可复制编辑
精品文
The di stance betw ee n m ajor pe ak s of the autocorrel ation functi on is a closel y rel ate d fe ature that is
fre quentl y use d to esti m ate the pitch pe ri od. In
Fi g . 15. 8
, the di stance between the m aj or peaks in the
autocorrel ati on
function
i s
about
0.
00 71
s.
Esti m ates
of
pi tch
from
the
autocorrel ation
functi on
are
also
susce pti ble to mistaking the fi rst vocal track resonance for the g l ottal e x citati on freque
ncy.
The absol ute m agnitude di ffere nce functi on ( AM DF), de fi ned
as,
is another functi on w hi ch is often use d i n estim ating the pitch of voi ce d spee ch. A n ex ample of the AM DF is
shown
in
Fi g.
15.
9
for
the
same
6 4 -m s
frame
of
the
/i
/ phoneme.
How e ve r, the
minim a
of
the
AM DF
i s
used
as
an
indi cator
of
the
pitch
pe ri od.
The
AM DF
has
been
show n
to
be
a
good
pitch
pe riod
i ndi cator
[R oss
et
al. ,
19 74 ] and does not requi re multi pli cati ons.
F ou r ie r
An a ly sis
One of the m ore comm on processe s for e stim ating the spe ctrum of a se gme nt of spee ch is the Fourie r
transform [ Oppenheim and S chafer, 1 97 5 ]. T he Fourie r transform of a seque nce is m athem ati call y de fine d
as
w here
s(
n)
represe nts
the
terms
of the
sequence.
The
short-ti me
Fourier
transform
of
a
seque nce
i s
a
timede pende nt
functi on,
de fi ned
as
F IGUR E 1 5 .9 A bsolute m agnitude diffe rence functi on of one frame of
/i/.
下载后可复制编辑
ronald-joyful
ronald-joyful
ronald-joyful
ronald-joyful
ronald-joyful
ronald-joyful
ronald-joyful
ronald-joyful
本文更新与2021-01-20 22:29,由作者提供,不代表本网站立场,转载请注明出处:https://www.bjmy2z.cn/gaokao/540637.html
-
上一篇:交通信号灯中英文对照外文翻译文献
下一篇:智能交通信号控制中英文对照外文翻译文献