Search code examples
pythonmatplotlibscipyvalueerrorpeak-detection

ValueError with scipy.signal.find_peaks


Right now I'm struggling with plotting the graph that should be showing the peaks of my dataset, but it looks like the find_peaks function is cutting off every data point that doesn't fit into the peak detection. Does anybody know how I can still plot the graphs by maybe replacing the data points that don't fit with zeros or is there any other possibility?

I am getting the following Error Message:

ValueError: x and y must have same first dimension, but have shapes (800,) and (105,)

def plot():
    i = 1
    d_time, d_x, d_y, d_z = [], [], [], []
    columns = ["Time", "y", "x", "z"]
    df = pd.read_csv("mydata.csv", usecols=columns)
    for zeile in df.Time:
        if i % 30 == 0:
            d_time.append(df.Time[i])
            d_x.append(df.x[i])
            d_y.append(df.y[i])
            d_z.append(df.z[i])
            i += 1
        elif i > 24000:
            break
        else:
            i += 1

    fig = plt.figure(dpi=64, figsize=(100, 60))

    p_z, _ = scipy.signal.find_peaks(d_z, 0, distance=5)

    plt.plot(d_time, d_z, c='red', label="Z-Achse")
    plt.plot(d_time, p_z, "x", c='blue', label="Peaks Z-Achse")

    plt.title("Peak Detection", fontsize=16)
    plt.xlabel('t(s)', fontsize=16)

    fig.autofmt_xdate()
    plt.ylabel("a(m/s²)", fontsize=16)
    plt.tick_params(axis='both', which='major')

    plt.legend()
    plt.show()


plot()

Link to Mydata.csv: https://cdn.discordapp.com/attachments/635516210473336844/945630182415405106/mydata.csv


Solution

  • Your problem lies in the fact (as you also mentioned) that p_z cuts a lot of points so d_time and p_z don't have the same length. Therefore, you get the error. What you can do is create a np.linspace equal to the length of d_time and plot it with the new time vector. Following is my solution:

    import matplotlib.pyplot as plt
    import pandas as pd
    from scipy import signal
    import numpy as np
    
    def plot():
            i=1
            d_time, d_x, d_y, d_z = [], [], [], []
            columns = ["Time", "y", "x", "z"]
            df = pd.read_csv("mydata.csv", usecols = columns)
            for zeile in df.Time:
                if i % 30 == 0:
                    d_time.append(df.Time[i])
                    d_x.append(df.x[i])
                    d_y.append(df.y[i])
                    d_z.append(df.z[i])
                    i+=1
                elif i > 24000:
                    break
                else:
                    i+=1
    
    
            fig = plt.figure(dpi=64, figsize=(100, 60))
    
            p_z, _ = signal.find_peaks(d_z, 0, distance=5)
    
            new_time = np.linspace(d_time[0], d_time[-1], len(p_z))
            plt.plot(d_time, d_z, c='red', label = "Z-Achse")
            # plt.plot(d_time, p_z, "x", c='blue', label = "Peaks Z-Achse")
            plt.plot(new_time, _['peak_heights'], "x", c='blue', label = "Peaks Z-Achse")
    
            plt.title("Peak Detection", fontsize=16)
            plt.xlabel('t(s)', fontsize=16)
            plt.yscale("log")
    
            fig.autofmt_xdate()
            plt.ylabel("a(m/s²)", fontsize=16)
            plt.tick_params(axis='both', which='major')
    
            plt.legend()
            plt.show()
    plot()
    

    As you can see in line 28, I have created a new time vector of length equal to d_time which solves your problem. Also, I have changed the y-axis to log scale (line 35) for seeing the results better.

    Plot