Search code examples
pythonpandasmatplotlibscipydatetimeindex

Missing Data and Graphing with Pandas and Matplotlib


I want my matplotlib plot to display my df's DateTimeIndex as consecutive count data (in seconds) on the x-axis and my df's Load data on the y axis. Then I want to overlap it with a scipy.signal find_peaks result (which has an x-axis of consecutive seconds). My data is not consecutive (real world data), though it does have a frequency of seconds.

Code

import pandas as pd
import matplotlib.pyplot as plt
from scipy import signal
import numpy as np

# Create Sample Dataset
df = pd.DataFrame([['2020-07-25 09:26:28',2],['2020-07-25 09:26:29',10],['2020-07-25 09:26:32',203],['2020-07-25 09:26:33',30]], 
                      columns = ['Time','Load'])

df['Time'] = pd.to_datetime(df['Time'])
df = df.set_index("Time")
print(df)

# Try to solve the problem
rng = pd.date_range(df.index[0], df.index[-1], freq='s')
print(rng)

peaks, _ = signal.find_peaks(df["Load"])
plt.plot(rng, df["Load"])
plt.plot(peaks, df["Load"][peaks], "x")
plt.plot(np.zeros_like(df["Load"]), "--", color="gray")
plt.show()

This code does not work because rng has a length of 6, while the df has a length of 4. I think I might be going about this the wrong way entirely. Thoughts?


Solution

  • You are really close - I think you can get what you want by reindexing your df with your range. For instance:

    df = df.reindex(rng).fillna(0)
    peaks, _ = signal.find_peaks(df["Load"])
    ...
    

    Does that do what you expect?