Search code examples
pythonaudiofrequencyvolumepyaudio

How can I get the start and end indices of a note in a volume graph?


I am trying to make a program, that tells me when a note has been pressed.

I have the following notes exported as a .wav file (The C Major Scale 4 times with different rhythms, dynamics and in different octaves): enter image description here

I can get the volumes of my sound file using the following code:

from scipy.io import wavfile

def get_volume(file):
    sr, data = wavfile.read(file)

    if data.ndim > 1:
        data = data[:, 0]

    return data

volumes = get_volume("FILE")

Here are some information about the output:

Max: 27851
Min: -25664
Mean: -0.7569383391943734
A Sample from the array: [ -7987  -8615  -8983  -9107  -9019  -8750  -8324  -7752  -7033  -6156
  -5115  -3920  -2610  -1245    106   1377   2520   3515   4364   5077
   5659   6113   6441   6639   6708   6662   6518   6288   5962   5525
   4963   4265   3420   2418   1264    -27  -1429  -2901  -4388  -5814
  -7101  -8186  -9028  -9614  -9955 -10077 -10012  -9785  -9401  -8846]

And here is what I get when I plot the volumes array (x is the index, y is the volume): enter image description here

I want to get the indices of the start and end of the notes like the ones in the image (Did it by hand not accurate): enter image description here

When I looked at the data I realized, that it is a 1d array and I also noticed, that when a note gets louder or quiter it is not smooth. It is like a ZigZag, but there is still a trend. So basically I can't just get the gradients (slope) of each point. So I though about grouping notes into batches and getting the average gradient there and thus doing the calculations with it, like so:

def get_average_gradient(arr):
    # Calculates average gradient
    return sum([i - (sum(arr) / len(arr)) for i in arr]) / len(arr)


def get_note_start_end(arr_size, batch_size, arr):
    # Finds start and end indices
    ranges = []
    curr_range = [0]

    prev_slope = curr_slope = "NO SLOPE"
    has_ended = False

    for i, j in enumerate(arr):
        if j > 0:
            curr_slope = "INCREASING"
        elif j < 0:
            curr_slope = "DECREASING"
        else:
            curr_slope = "NO SLOPE"

        if prev_slope == "DECREASING" and not has_ended:
            if i == len(arr) - 1 or arr[i + 1] < 0:
                if curr_slope != "DECREASING":
                    curr_range.append((i + 1) * batch_size + batch_size)
                    ranges.append(curr_range)
                    curr_range = [(i + 1) * batch_size + batch_size + 1]
                    has_ended = True

        if has_ended and curr_slope == "INCREASING":
            has_ended = False

        prev_slope = curr_slope

    ranges[-1][-1] = arr_size - 1

    return ranges


def get_notes(batch_size, arr):
    # Gets the gradients of the batches
    out = []

    for i in range(0, len(arr), batch_size):
        if i + batch_size > len(arr):
            gradient = get_average_gradient(arr[i:])
        else:
            gradient = get_average_gradient(arr[i: i+batch_size])

        # print(gradient, i)
        out.append(gradient)

    return get_note_start_end(len(arr), batch_size, out)

notes = get_notes(128, volumes)

The problem with this is, that if the batch size is too small, then it returns the indices of small peaks, which aren't a note on their own. If the batch size is too big then the program misses the start and end indices.

I also tried to get the notes, by using the silence. Here is the code I used:

from pydub import AudioSegment, silence

audio = intro = AudioSegment.from_wav("C - Major - Test.wav")
dBFS = audio.dBFS

notes = silence.detect_nonsilent(audio, min_silence_len=50, silence_thresh=dBFS-10)

This worked the best, but it still wasn't good enough. Here is what I got: enter image description here

It some notes pretty well, but it wasn't able to identify notes accurately if the notes themselves didn't become very quite before a different one was played (Like in the second scale and in the fourth scale).

I have been thinking about this problem for days and I have basically tried most if not all of the good(?) ideas I had. I am new to analysing audio files. Maybe I am using the wrong data to do what I want to do. Maybe I need to use the frequency data (I tried getting it, but couldn't make sense of it) Frequency code:

from scipy.fft import *
from scipy.io import wavfile
import matplotlib.pyplot as plt


def get_freq(file, start_time, end_time):
    sr, data = wavfile.read(file)

    if data.ndim > 1:
        data = data[:, 0]
    else:
        pass

    # Fourier Transform
    N = len(data)
    yf = rfft(data)
    xf = rfftfreq(N, 1 / sr)

    return xf, yf


FILE = "C - Major - Test.wav"

plt.plot(*get_freq(FILE, 0, 10))
plt.show() 

And the frequency graph: enter image description here

And here is the .wav file: https://drive.google.com/file/d/1CERH-eovu20uhGoV1_O3B2Ph-4-uXpiP/view?usp=sharing

Any help is appreciated :)


Solution

  • think this is what you need: first you convert negative numbers into positive ones and smooth the line to eliminate noise, to find the lower peaks yo work with the negative values.

    from scipy.io import wavfile
    import matplotlib.pyplot as plt
    from scipy.signal import find_peaks
    import numpy as np
    from scipy.signal import savgol_filter
    
    def get_volume(file):
        sr, data = wavfile.read(file)
        if data.ndim > 1:
            data = data[:, 0]
        return data
    
    v1 = abs(get_volume("test.wav"))
    #Smooth the curve
    volumes=savgol_filter(v1,10000 , 3)
    lv=volumes*-1
    #find peaks
    peaks,_ = find_peaks(volumes,distance=8000,prominence=300)
    lpeaks,_= find_peaks(lv,distance=8000,prominence=300)
    # plot them
    plt.plot(volumes)
    plt.plot(peaks,volumes[peaks],"x")
    plt.plot(lpeaks,volumes[lpeaks],"o")
    plt.plot(np.zeros_like(volumes), "--", color="gray")
    plt.show()
    
    
    
    

    Plot with your test file, x marks the high peaks and o the lower peaks enter image description here