python pandas google-colaboratory hvplot

HV Plot: Plotting multiple lines with null values

I have a DataFrame I am trying to graph using HV Plot.

So far, I have something like this:

new_df = new_df.dropna(subset=['Reflectance'])
new_df = new_df.sort_values(by='Wavelength')

reflectance_plot = new_df.hvplot.line(x = "Wavelength",y = "Reflectance", by="UniqueID", legend=False).opts(fontsize={'title': 16, 'labels': 14, 'yticks': 12},xrotation=45, xticks=15)
reflectance_plot

Which gives me something like this:

As you can see, between the smooth areas with data, there are lots of straight lines where there are no values. I am trying to remove these straight lines so that only the data is plotted. I tried to do that with this code:

new_df['Reflectance'] = new_df['Reflectance'].fillna(np.nan).replace([np.nan], [None])
new_df = new_df.sort_values(by='Wavelength')
    
reflectance_plot = new_df.hvplot.line(x = "Wavelength",y = "Reflectance", by="UniqueID", legend=False).opts(fontsize={'title': 16, 'labels': 14, 'yticks': 12},xrotation=45, xticks=15)
reflectance_plot

Which leaves me with:

So obviously this is what I am trying to accomplish, except now the vast majority of the data is completely gone. I would appreciate any advice or insight onto why this is happening and how to fix it.

Solution

I came across a similar issue, and what I came up with was the following:

Generate & plot some problematic data:

import pandas as pd
import numpy as np
import hvplot.pandas

df = pd.DataFrame({'data1':np.random.randn(22),
                   'data2':np.random.randn(22)+3})

df['time'] = pd.to_datetime('2022-12-25T09:00') + \
             np.cumsum(([pd.Timedelta(1, unit='h')]*5 +
                       [pd.Timedelta(30, unit='h')] + # <-- big 'Ol gap in the data
                       [pd.Timedelta(1, unit='h')]*5)*2)

df.set_index('time', inplace=True)
df.hvplot()

Which plots something like the following - where the gaps in the data is hopefully obvious (but won't always be):

So the approach is to find gaps in your data which are unacceptably long. This will be context-specific. In the data above good data is 1h apart, and the gaps is 30h - so I use a max acceptable gap of 2h:

# Insert NA just after any gaps which are unacceptably long:
dt_max_acceptable = pd.Timedelta(2, unit='h')

df['dt'] = df.index.to_series().diff()
t_at_end_of_gaps = df[df.dt > dt_max_acceptable].index.values
t_before_end_of_gaps = [i - pd.Timedelta(1) for i in t_at_end_of_gaps]

for t in t_before_end_of_gaps:
    df.loc[t] = pd.NA
    
df.sort_index(inplace=True)
df.hvplot()

Which should plot something like this - showing that the line no longer spans the gaps which are 'too long':

The approach is quite easy to apply - and works for my purposes. The down side is that it's adding artificial rows with NaN data in them - which might not always be acceptable.