Search code examples
pythonaudiosignal-processinglibrosasample-rate

Incorrect signal returned by librosa.clicks


I'm trying to write a simple metronome with librosa.clicks

import librosa
import numpy as np

bpm = 200
exercise_duration = 10
sr = 22050

seconds_per_beat = 60/bpm
beats_number = round(bpm/ (60/exercise_duration))

metronome_clicks = [0]

for i in range(beats_number - 1):
    metronome_clicks.append(seconds_per_beat + metronome_clicks[i])

click = librosa.clicks(times=metronome_clicks,sr=sr, length=22050 * exercise_duration, click_duration=0.1)

hop_length = 512
onset_samples = librosa.onset.onset_detect(y=click, sr=sr, units='samples', hop_length=hop_length)

onset_diff = np.diff(onset_samples)

print(onset_samples)
print(onset_diff)

I've noticed a significant inaccuracy while testing the results of librosa.clicks - the printed result of np.diff on my machine looks like this:

[6144 6656 6656 6656 6656 6656 6656 6656 6656 6656 6656 6656 6656 6144
 6656 6656 6656 6656 6656 6656 6656 6656 6656 6656 6656 6144 6656 6656
 6656 6656 6656]

Which basically means 23ms latency. I've also noticed that the amount of latency in samples = 512, and if I tweak the hop_length into 256 then the latency is 256 samples, so it seems to be correlated somehow to hop length - but to be honest, I don't really know how, and what are the ways to fix that. Also, when you tweak bpm into 60 then results are good:

[22016 22016 22016 22016 22016 22016 22016 22016]

But not perfect, as 60bpm should equal 1 second which should give us 22050 samples. In 120 bpm it looks like this:

[11264 10752 11264 10752 11264 10752 11264 10752 11264 11264 10752 11264
 10752 11264 10752 11264 10752 11264]

Which is another kind of behavior that I don't really understand. If I'm correct, then increasing sample rate or decreasing hop length should minimize the problem, but I'd like to get a perfect metronome timing as it's crucial for the app that I'm building.


Solution

  • I have imported the click as wav file into an audio editor and found that it is accurate within 1-2 samples.

    Onset detection algorithms are not perfect and very dependant on the transient of the signal.

    I have plotted the waveform against the onset_samples and even at a birds-eye view you can tell that the onset times are not that consistent.

    onset vs waveform

    bottom line: the click is fine. its the onset detection that is off.