Developer's Guide--Part III
Speech Synthesis
There are three method to implement PSOLA, which are TD-PSOLA; LPC-PSOLA; FD-PSOLA. We use the TD-PSOLA method.
There are mainly three steps to implement speech synthesize by PSOLA.
(1).Pitch synchronous analysis:
กกSet the synchronized mark of the unit of speech synthesize as the center, choose a proper window length and window the unit of synthesized speech, then we can get a group of short-time signal
กกกกกกกกกกกกกกกกกกกกกกกก
where is the pitch mark, the default value of is Hamming window. The window length is often chose to be 2~4 times as a pitch period.
(2).Pitch synchronous modification:
กก According to the TD-PSOLA method, short-time synthesize signal is a copy of its short-time analysis signal. If the short-time analysis signal is , then the short-time synthesize signal is
กกกกกกกกกกกกกกกกกกกกกกกก
(3).Pitch synchronous synthesize:
กกPitch synchronous synthesize is implemented by superposition of short-time synthesize. There are a lot of method about superposition of short-time synthesize, we use the Least-Square Overlap-Added Scheme. The final synthesize signal is:
กกกกกกกกกกกกกกกกกก
it can also be expressed as:
กกกกกกกกกกกกกกกกกกกกกกกกกกกกกก(3.1)
If we set = 1, then we have:
กกกกกกกกกกกกกกกกกกกกกกกกกก กกกกกกกกกกกกกกกก(3.2)
กก Using the Equation ( 3.1) and ( 3.2), we can extend or compress the relatively distance between the pitch synchronized mark ( ) of the original speech, and finally we can have another pitch synchronized mark ( ) of the synthesized speech. The POSLA method can be illustrated in 3.3
กกกกกกกกกกกก
กกกกกกกกกกกกกก
กกกกกกกกกกกกกกกกกกกกกกกกกกกก3.3
(a) Pitch frequency was decreased
(b) Speech was extended but the pitch frequency remains the same