I have a DataFrame I am trying to graph using HV Plot.
So far, I have something like this:
new_df = new_df.dropna(subset=['Reflectance'])
new_df = new_df.sort_values(by='Wavelength')
reflectance_plot = new_df.hvplot.line(x = "Wavelength",y = "Reflectance", by="UniqueID", legend=False).opts(fontsize={'title': 16, 'labels': 14, 'yticks': 12},xrotation=45, xticks=15)
reflectance_plot
Which gives me something like this:
As you can see, between the smooth areas with data, there are lots of straight lines where there are no values. I am trying to remove these straight lines so that only the data is plotted. I tried to do that with this code:
new_df['Reflectance'] = new_df['Reflectance'].fillna(np.nan).replace([np.nan], [None])
new_df = new_df.sort_values(by='Wavelength')
reflectance_plot = new_df.hvplot.line(x = "Wavelength",y = "Reflectance", by="UniqueID", legend=False).opts(fontsize={'title': 16, 'labels': 14, 'yticks': 12},xrotation=45, xticks=15)
reflectance_plot
So obviously this is what I am trying to accomplish, except now the vast majority of the data is completely gone. I would appreciate any advice or insight onto why this is happening and how to fix it.
I came across a similar issue, and what I came up with was the following:
Generate & plot some problematic data:
import pandas as pd
import numpy as np
import hvplot.pandas
df = pd.DataFrame({'data1':np.random.randn(22),
'data2':np.random.randn(22)+3})
df['time'] = pd.to_datetime('2022-12-25T09:00') + \
np.cumsum(([pd.Timedelta(1, unit='h')]*5 +
[pd.Timedelta(30, unit='h')] + # <-- big 'Ol gap in the data
[pd.Timedelta(1, unit='h')]*5)*2)
df.set_index('time', inplace=True)
df.hvplot()
Which plots something like the following - where the gaps in the data is hopefully obvious (but won't always be):
So the approach is to find gaps in your data which are unacceptably long. This will be context-specific. In the data above good data is 1h apart, and the gaps is 30h - so I use a max acceptable gap of 2h:
# Insert NA just after any gaps which are unacceptably long:
dt_max_acceptable = pd.Timedelta(2, unit='h')
df['dt'] = df.index.to_series().diff()
t_at_end_of_gaps = df[df.dt > dt_max_acceptable].index.values
t_before_end_of_gaps = [i - pd.Timedelta(1) for i in t_at_end_of_gaps]
for t in t_before_end_of_gaps:
df.loc[t] = pd.NA
df.sort_index(inplace=True)
df.hvplot()
Which should plot something like this - showing that the line no longer spans the gaps which are 'too long':
The approach is quite easy to apply - and works for my purposes. The down side is that it's adding artificial rows with NaN data in them - which might not always be acceptable.