PSOLA modification of speech

This is a step-by-step demonstration of how to modify the melodic and temporal patterns of speech in the Praat software. It accompanies a paper written by Alice Henderson and Radek Skarnitzl, soon to be published in Language Learning & Technology (“A Better Me”: Using Acoustically Modified Learner Voices As Models).

This can be used by pronunciation instructors to create auditory feedback for their students, to help them realize what they need to change in their pronunciation.

You can use the videos below, or you can refer to a cheatsheet describing the main steps.

The basics

  1. Open Praat. You may close the Picture window (which opens on the right), we will work with the Praat Objects window.
  2. Click Open – Read from file… (or simply press Ctrl+O) to open the sound file you wish to manipulate.
  3. This should be a mono file. If not, check whether the two channels are more or less identical, or whether one has stronger amplitudes (the waveform moves more from the midline); you can see the sound file by clicking on View & Edit on the right. Then click on Convert – Extract one channel… and choose 1 or 2 (it doesn’t matter in the first case, choose the stronger channel in the second case).
  4. Click on Manipulate in the menu on the right, and then on To Manipulation… In the next window, there is usually no need to change anything: click on OK. A new object, Manipulation, appears in the window. Click View & Edit to start working.
  5. How to work in the Manipulation window is shown in the first video.
  6. Remember that Praat does not automatically save your progress. It is therefore a good idea to save the Manipulation file continuously; this is achieved by selecting the file in the Objects window and clicking Save – Save as binary file.

Adjusting the melodic patterns

In the first step, it is easiest to simplify the contour corresponding to the fundamental frequency (f0). This is shown in the following video.

 

In the second step, we can create the desired melodic contour, as described in the following video.

Adjusting the temporal patterns

If you only want to change the duration of few sounds, you can skip most of this first preparatory step. If you plan to change a lot of sounds, however, it will speed things up later if you prepare the file in the way shown in the following video. What we need to do is “fix” the original duration around (mostly) vowels we will subsequently be changing. In the next stage, we will be creating duration “dents”, lengthening or shortening, within the limits we have fixed.

 

The lengthening and shortening of individual sounds is shown in the last video. The first stage will help us to create “dents” without affecting the duration of those sounds we want to keep at the relative duration of 1. At the end, this video also shows how to change the global speech rate.

The final steps and some limitations

When you have created the manipulation, you can create a sound object from it in two ways: in the Manipulation window, by clicking File – Publish resynthesis, or in the Objects window, by clicking Get synthesis (overlap-add). Then save the sound by clicking Save – Save as WAV file. As mentioned above, you can also save the Manipulation file, so that you can return to it later, by clicking Save – Save as binary file.

When performing manipulations like this, we should be aware of the limitations. This concerns especially recordings with poorer signal quality – for example with relatively strong background noise. Praat needs to be able to correctly estimate the fundamental frequency to be able to manipulate it, and noise may prevent this. Correct estimation of f0 may also be a problem in voices which have aperiodic portions. Apart from pathological voices, this happens in speakers with creaky phonation (also called vocal fry).

Good luck with your manipulations!

Úvod > Research > From our research > PSOLA modification of speech