Understanding Formant Filters: Creating Human Vowel Sounds in Synthesis

Written by

The human voice is an incredibly complex instrument, capable of shifting timbre to produce distinct vowel sounds (“ah,” “ee,” “oo”) while maintaining a consistent pitch. In electronic music production and synthesis, replicating this vocal quality is essential for creating everything from vocal-like leads to artificial talking textures.

The secret to this vocal emulation lies not in the source oscillator, but in how we filter it. This technique is known as Formant Synthesis, and it uses Formant Filters to shape sound just as our throat and mouth shape air. What is a Formant?

Formants are the resonant frequencies—the acoustic “peaks”—of the human vocal tract. When we speak, our vocal cords produce a buzz (the fundamental pitch). This buzz passes through the throat, mouth, and sinuses, which act as a resonator, amplifying certain overtones and dampening others.

The specific, relatively static frequencies that get amplified are known as formants. While a singer might change their pitch, the formants (vowel shapes) often remain in similar frequency ranges because they are determined by the shape of the mouth and tongue, not the pitch itself. Formant Filters in Synthesis

To create a “talking” synth, you need to emulate this process. A formant filter works by using multiple band-pass filters placed in parallel, rather than in series.

Vocal Fold Source: Usually a raw waveform like a sawtooth or pulse wave (similar to the buzzing produced by vocal cords).

The Filter Structure: A formant filter creates several resonant frequency peaks simultaneously to create a specific vowel sound.

When you shift the filter’s formant frequencies, you change the perceived vowel sound. Typical Vowel Frequencies

Different vowels require different, static formant peaks. The first three formants (

) are most significant for defining a vowel. For example, the vowel “ee” is produced by a high second formant, while “oo” is produced by a low first formant.

Below are average formant frequencies (in Hz) for an adult male voice: “ee” (beet): F1=240, F2=2400 “oo” (boot): F1=300, F2=870 “ah” (father): F1=730, F2=1090 “eh” (bet): F1=530, F2=1840 How to Create Vocal Sounds in a Synth

You can create formant-style vocal sounds in most modern synthesizers (like Vital, Serum, or Serum) using these techniques:

Select Source: Start with a sawtooth or bright, rich waveform.

Filter Application: Use a formant filter type if your synthesizer includes one. These are designed to act as multiple band-pass filters automatically.

Modulation: Set the formant filter to be fixed, or automate the “vowel” parameter (often A-E-I-O-U) over time to create a talking effect.

Add Formant Spacing: Adjust the resonance to make the vowels more pronounced.

If you don’t have a dedicated formant filter, you can create one by placing two or three band-pass filters in parallel with high resonance to emphasize the frequencies mentioned above. If you’d like, I can:

Show you how to map this in a specific VST (like Ableton’s Operator or Serum).

Explain the difference between formant shifting and pitch shifting.

Give you tips on using noise generators for consonant sounds (e.g., “s” or “f”). How to FORMANT? creating a vocaloid synth in Bitwig Studio

Understanding Formant Filters: Creating Human Vowel Sounds in Synthesis

Comments

Leave a Reply Cancel reply

More posts

,false,false]–> Comprehensive