Search code examples
audiovoice-recognitionaubio

Determine fundamental frequency of voice recordings


I am using the command line tool aubiopitch to analyze voice recordings. My goal is to determine the fundamental frequency of the voice recorded. I know, of course, that the frequency varies – that's why I want to calculate an "average" in Hz over a 30-second recording.

My question: aubio uses different methods to determine the pitch of a recording: Schmitt trigger, harmonic comb, yin, yinfft etc. Which one of those would be my preferred choice when dealing with pure human voice recordings (no background music, atmo etc.).


Solution

  • I would recommend using yinfast or yinfft (default). For a discussion of the algorithms, their parameters, and their performance, see Chapter 3 of this document.

    Note that the median is better suited than the average in this case.