CHAPTER 5 AUDIO DEMONSTRATIONS: ANALYSIS AND SYNTHESIS OF POLE-ZERO SPEECH MODELS ------------------------------------------------------------------- Audio Demo 5.1: Demonstration of glottal flow derivative (GFD) estimation - Using closed-phase estimation based on formant estimation, and inverse filtering via pitch-synchronous covariance method (Section 5.7.2) Female Speaker -------------- ln.roy.10k Original ln.roy.GPTimes.10k Glottal onset times (large impulses)/ closed-phase region (within small impulses) ln.roy.GFD.10k Synthesized glottal flow derivative estimate ln.roy.LF_MGFD.10k Synthesized LF-modeled glottal flow derivative estimate figure_ln.roy.GFD Spectrogram comparison of (top to bottom) a: ln.roy.10k b: ln.roy.GFD.10k c: ln.roy.LF_MGFD.10k Notes: - The GFDs have roughly flat spectra except for 1st-formant residual and spectral tilt, as predicted Southern vs Northern Speaker [*] ---------------------------- sa1.mbdg0.10k Original southern sa1.mjes0.10k Original northern sa1.LF8_GFDE.mbdg0.10k GFD of southern sa1.LF8_GFDE.mjes0.10k GFD of northen figure_SouthNorth Spectrogram comparison of Left (top to bottom): a: sa1.mjes0.10k b: sa1.LF8_GFDE.mjes0.10k Right (top to bottom): c: sa1.mbdg0.10k d: sa1.LF8_GFDE.mbdg0.10k Notes: - Southern speaker is characterized by greater aspiration as revealed in comparison of the GFDs of the two utterances [*] Passages from the TIMIT database. ------------------------------------------------------------------- Audio Demo 5.2: Residual examples - sounds used in Figure 5.11 tfq.tea.voiced.10k: "i" in which tfq.tea.plosive.10k: "t" in tea tfq.tea.fricative.10k: "ch" in which ------------------------------------------------------------------- Audio Demo 5.3: Autocorrelation examples - sounds used in Figure 5.6 o_phon.10k: "o" as in "pop" k_phon.10k "k" as in "baker" f_phon.10k "f" as in "father" g_phon.10k "g" as in "go" ------------------------------------------------------------------- Audio Demo 5.4: Effect of all-pole order in linear prediction analysis/synthesis [*] - additional example in Figure 5.13 lpc_order.16k File contains: Analysis/synthesis with a: Order 2/6/10/14/18 b: Original figure_order2VS18 Spectrogram comparison of order 2 and 18 [*] Reprinted with permission from: B.S. ATAL AND S.L. HANAUER, "SPEECH ANALYSIS AND SYNTHESIS BY LINEAR PREDICTION OF THE SPEECH WAVEFORM," J. ACOUSTICAL SOCIETY OF AMERICA, VOL. 50, PP. 637-655, AUGUST 1971. Copyright 1971, Acoustical Society of America. These demonstrations may be downloaded for personal use only. Any other use requires prior permission of the author and the Acoustical Society of America (http://asa.aip.org). ------------------------------------------------------------------- Audio Demo 5.5: Linear prediction analysis/synthesis synthesis_12.16k File contains [*]: a: Synthesis (5 utterance pairs) b: Original [*] Reprinted with permission from: B.S. ATAL AND S.L. HANAUER, "SPEECH ANALYSIS AND SYNTHESIS BY LINEAR PREDICTION OF THE SPEECH WAVEFORM," J. ACOUSTICAL SOCIETY OF AMERICA, VOL. 50, PP. 637-655, AUGUST 1971. Copyright 1971, Acoustical Society of America. These demonstrations may be downloaded for personal use only. Any other use requires prior permission of the author and the Acoustical Society of America (http://asa.aip.org). ------------------------------------------------------------------- Audio Demo 5.6: Basic linear prediction analysis/synthesis - Example with fast-talking, high-pitched male from real-time implementation [*]; this example is at 2400 bps, but illustrates issues with nonstationarity and high pitch. lpc_fast_talker.16k File contains: a: Original b: Analysis/synthesis of high (pitched) and fast talker [*] Lincoln Laboratory LPC real-time implementation by E.M. Hofstetter and J. Feldman. -------------------------------------------------------------------