CHAPTER 9 AUDIO DEMONSTRATIONS: SINUSOIDAL ANALYSIS/SYNTHESIS ------------------------------------------------------------------- Audio Demo 9.1: Analysis/synthesis with different phase functions - Mixed-phase (basic) reconstruction - Minimum-phase reconstruction - Zero-phase reconstruction Male [*] ---- ch.4a1.6.org.10k Original ch.4a1.6.mix.10k Mixed-Phase ch.4a1.6.min.10k Minimun-Phase ch.4a1.6.zero.10k Zero-Phase Female ------ ks.8a2.10.org.10k Original ks.8a2.10.mix.10k Mixed-Phase ks.8a2.10.min.10k Minimun-Phase ks.8a2.10.zero.10k Zero-Phase [*] Passages are from the Diagnostic Acceptability Measure (DAM) database ------------------------------------------------------------------- Audio Demo 9.2: Dispersion via initiating each sinewave track with zero phase (Section 9.4.3) Male Speaker (Figures 9.16-17) ------------ jazz_hour.org.10k Original jazz_hour.syn60_10.10k Reconstruction jazz_hour.syn60_10_disperse.10k Dispersed Female Speaker ------------ mlm.tea.org.10k Original mlm.tea.syn15_10.10k Reconstruction mlm.tea.syn15_10_disperse.10k Dispersed Notes: - There is an aural sensitivity difference with low- and high-pitched speakers ------------------------------------------------------------------- Audio Demo 9.3: Time-scale modification of speech and quasi-periodic audio - Sinewave-base modification with voicing-dependent rate factor (Section 9.5.2) Male Speaker ------------ tfq.tea.org.10k Original tfq.tea.tsmtv0p8.10k Fast tfq.tea.tsmtv0p5.10k Faster tfq.tea.tsmtv1p2.10k Slow tfq.tea.tsmtv1p5.10k Slower Female Speaker -------------- ln.swm.org.10k Original ln.swm.tsmtv0p8.10k Fast ln.swm.tsmtv0p5.10k Faster ln.swm.tsmtv1p2.10k Slow ln.swm.tsmtv1p5.10k Slower Trumpet ------- trumpet.org.10k Original trumpet.tsm0p75.10k Fast trumpet.tsm1p25.10k Slow ------------------------------------------------------------------- Audio Demo 9.4: Pitch change (Section 9.5.2/Exercise 9.11) up_down_pitch.16k File contains: Female a: Pitch raised b: Original c: Pitch lowered Male a: Pitch raised b: Original c: Pitch lowered mono_pitch.16k File contains: a: Original (three utterance pairs) b: Monotone pitch figure_MonoPitch File contains: a: Spectrogram of original b: Spectrogram of monotone pitch ------------------------------------------------------------------- Audio Demo 9.5: Pitch and spectral change (Section 9.5.2/Exercise 9.11) Males ----- cp.seg.org.8k Original cp.seg.PitSpec_low.8k Low pitch/long vocal tract cp.seg.PitSpec_high.8k High pitch/short vocal tract mjsw0_si1010.org.8k Original mjsw0_si1010.PitSpec_low.8k Low pitch/long vocal tract mjsw0_si1010.PitSpec_high.8k High pitch/short vocal tract Females ------- glo.org.8k Original glo.PitSpec_low.8k Low pitch/long vocal tract glo.PitSpec_high.8k High pitch/short vocal tract sc_seg.org.8k Original sc_seg.PitSpec_low.8k Low pitch/long vocal tract sc_seg.PitSpec_high.8k High pitch/short vocal tract ------------------------------------------------------------------- Audio Demo 9.6: Peak-to-rms reduction (Section 9.5.2) - Re-digitized from analog tape so original peak-to-rms is somewhat altered Each file contains: a: Original b: Mild reduction in peak-to-rms (~1.5 dB) c: Large reduction in peak-to-rms (~3.0 dB) post_NoNoise.16k Low-noise case post_WithNoise.16k High-noise case ------------------------------------------------------------------- Audio Demo 9.7: Speaker Separation (Exercise 9.17) - Re-digitized from analog tape speech_separation_9dB_d3.16k File contains: (assumed known frequencies; without and with interpolation to help remove ambiguties) a: Speaker A b: Speaker B c: Speakers A+B (9 dB) d: Separated A without interpolation e: Separated A with interpolation f: Separated B without interpolation g: Separated B with interpolation speech_separation_4dB_d4.16k File contains: (assumed known frequencies; without and with interpolation to help remove ambiguties) a: Speaker A b: Speaker B c: Speakers A+B (-4 dB) d: Separated A without interpolation e: Separated A with interpolation f: Separated B without interpolation g: Separated B with interpolation speech_separation_6dB_d5.16k File contains: (assumed known frequencies; without and with interpolation to help remove ambiguties) a: Speaker A b: Speaker B c: Speakers A+B (6 dB) d: Separated A without interpolation e: Separated A with interpolation f: Separated B without interpolation g: Separated B with interpolation speech_separation_0dB_d7.16k File contains: (used estimated dual pitch in place interpolation) a: Speaker A b: Speaker B c: Speakers A+B (9 dB) d: Separated A e: Separated B speech_separation_16dB_d8.16k File contains: (Frequency sampling only) a: Speaker A b: Speaker B c: Speakers A+B (-16 dB) d: Separated A by sampling A+B at peaks of A e: Separated B by sampling A+B at peaks of B ------------------------------------------------------------------- Audio Demo 9.8: Frequency compression/expansion (Exercise 9.10) - Using sinusoidal analysis/synthesis ks.8a2.10.org.10k [*] Original ks.8a2.10.fexpa.10k Frequency-compressed ks.8a2.10.fexcp.10k Frequency-compressed followed by frequency-expanded figure_ExpComp Spectrograms of (from top to bottom): a: ks.8a2.10.org.10k b: ks.8a2.10.fexpa.10k c: ks.8a2.10.fexcp.10k [*] Passage are from the Diagnostic Acceptability Measure (DAM) database ------------------------------------------------------------------- Audio Demo 9.9: Deterministic + stochastic analysis/synthesis - Re-digitized from analog tape [*] Each file contains: a: Original b: Deterministic synthesis c: Residual d: Deterministic synthesis + residual flute_residual.16k piano_residual.16k human_residual.16k [*] X. Serra, A System for Sound Analysis/Transformation/Synthesis Based on a Deterministic Plus Stochastic Decomposition, PhD Thesis, CCRMA, Department of Music, Stanford University, 1989. Copyright 1989, X.Serra. Used by permission. -------------------------------------------------------------------