I am using parselmouth (wrapper around praat) to extract intensity and pitch features by doing so:
snd = parselmouth.Sound(path)
intensity = snd.to_intensity()
pitch = snd.to_pitch()
However, the audio file contains long sequences of silences, which I would like to remove before I calculate these audio metrics. I am able to remove the silences by processing the numpy array returned by reading the audio through the wave package (and applying some logic), but am not able to pass the new array to parselmouth.
I am even open to providing startTime and endTime parameters to parselmouth, but cannot find documentation that supports that either.
There are two options that might be useful for this situation:
parselmouth.Sound
from samples rather than reading from file. There's a constructor taking a NumPy array (or a list/iterable convertible to NumpyArray) and sampling frequencySound
also has a method Sound.extract_part
(equivalent to Praat's "Extract part..." button in the UI) that allows you to extract fragments (optionally even windowed with a different window shape than a rectangular window).Do note that you will likely want to leave a bit or margin when removing silences, because 1) both the intensity and pitch analyses use a sliding window of a certain size (so if you don't leave a margin, some of the windows will be over 'discontinous speech'), and 2) the pitch analysis uses a heuristic to keep a +- continuous pitch contour (so if you don't leave a margin where silence/the absence of voicing is detected, neighboring fragments' pitch estimates will influence each other).