Search code examples
python-3.xdataframepython-xarrayhvplot

Overlay of two plots from two different data sources using Python / hvplot


I would like to plot a line plot (source: pandas dataframe) over a hvplot (source: xarray/ NetCDF).

The xarray looks like this:

dataDIR = 'ceilodata.nc'
DS = xr.open_dataset(dataDIR)
DS = DS.transpose()
print(DS)

<xarray.Dataset>
Dimensions:         (range_hr: 32, range: 1024, layer: 3, time: 5760)
Coordinates:
  * range_hr        (range_hr) float32 0.001 4.995 9.99 ... 144.9 149.9 154.8
  * range           (range) float32 14.98 29.97 44.96 ... 1.533e+04 1.534e+04
  * layer           (layer) int32 1 2 3
  * time            (time) datetime64[ns] 2022-03-18 ... 2022-03-18T23:59:46
Data variables: (12/41)
    zenith          float32 ...
    wavelength      float32 ...
    scaling         float32 ...
    range_gate_hr   float32 ...
    range_gate      float32 ...
    longitude       float32 ...
    ...              ...
    cbe             (layer, time) int16 ...
    beta_raw_hr     (range_hr, time) float32 ...
    beta_raw        (range, time) float32 ...
    bcc             (time) int8 ...
    base            (time) float32 ...
    average_time    (time) int32 ...
Attributes: (12/13)
    comment:           
    software_version:  15.06.1 2.13 1.040 1
    title:             CHM15k Nimbus
    wmo_id:            10865
    month:             3
    source:            CHM160138
    ...                ...
    serlom:            TUB160038
    location:          muenchen
    year:              2022
    device_name:       CHM160138
    institution:       DWD
    day:               18

The pandas dataframe source looks like this:

df = pd.read_csv('PTU.csv')
print(df)

               Unnamed: 0                PTU
0     2022-03-18 07:38:56            451.839
1     2022-03-18 07:38:57            468.826
2     2022-03-18 07:38:58            469.093
3     2022-03-18 07:38:59            469.356
4     2022-03-18 07:39:00            469.623
...                   ...                ...
6140  2022-03-18 09:21:16          31690.600
6141  2022-03-18 09:21:17          31694.700
6142  2022-03-18 09:21:18          31692.900
6143  2022-03-18 09:21:19          31712.000
6144  2022-03-18 09:21:20          31711.500

[6145 rows x 2 columns]

Both are time dependend datasets but have different time stamps and frequencies. Time is index in each data set.

I tried to plot them together with additional imports of holoviews. While each single plot is no problem, plotting them together seems not to work the way I tried it:

import hvplot.pandas
import holoviews as hv

# cmap of the xarray:
ceilo = (DS.b_r.hvplot(cmap="viridis_r", width = 850, height = 600, title = 'title', clim = (5, 80))

# line plot of the data frame
p = df.hvplot.line()

# add pressure line plot to pcolormeshplot using * which overlays the line on the plot
ceilo * p

but this ended in an error message with the following complete traceback:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-10-2b1c6baca339> in <module>
     24 p = df.hvplot.line()
     25 # add pressure line plot to pcolormeshplot using * which overlays the line on the plot
---> 26 ceilo * df

c:\python38\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
     68         other = item_from_zerodim(other)
     69 
---> 70         return method(self, other)
     71 
     72     return new_method

c:\python38\lib\site-packages\pandas\core\arraylike.py in __rmul__(self, other)
    118     @unpack_zerodim_and_defer("__rmul__")
    119     def __rmul__(self, other):
--> 120         return self._arith_method(other, roperator.rmul)
    121 
    122     @unpack_zerodim_and_defer("__truediv__")

c:\python38\lib\site-packages\pandas\core\frame.py in _arith_method(self, other, op)
   6936         other = ops.maybe_prepare_scalar_for_op(other, (self.shape[axis],))
   6937 
-> 6938         self, other = ops.align_method_FRAME(self, other, axis, flex=True, level=None)
   6939 
   6940         new_data = self._dispatch_frame_op(other, op, axis=axis)

c:\python38\lib\site-packages\pandas\core\ops\__init__.py in align_method_FRAME(left, right, axis, flex, level)
    275     elif is_list_like(right) and not isinstance(right, (ABCSeries, ABCDataFrame)):
    276         # GH 36702. Raise when attempting arithmetic with list of array-like.
--> 277         if any(is_array_like(el) for el in right):
    278             raise ValueError(
    279                 f"Unable to coerce list of {type(right[0])} to Series/DataFrame"

c:\python38\lib\site-packages\holoviews\core\element.py in __iter__(self)
     94     def __iter__(self):
     95         "Disable iterator interface."
---> 96         raise NotImplementedError('Iteration on Elements is not supported.')
     97 
     98 

NotImplementedError: Iteration on Elements is not supported.

Is the different time frequency a problem here? The line plot should be orientated along the x- and the y-axis considering the right time stamp and altitude of the underlying cmap-(matplotlib)-plot.

To illustrate what I am aiming for, here is a picture of my goal:

enter image description here

Thanks for reading / helping.


Solution

  • I found a solution for this case:

    Both dataset time columns have to have the same format. In my case it's: datetime64[ns] (to adopt to the NetCDF xarray). That is why I converted the dataframe time column to datetime64[ns]:

    df.Datetime = df.Datetime.astype('datetime64')
    

    Also I found the data to be type "object". So I transformed it to "float":

    df.PTU = df.PTU.astype(float) # convert to correct data type
    

    The last step was choosing hvplot as this helps in plotting xarray data

    import hvplot.xarray
    hvplot.quadmesh
    

    And here is my final solution:

    title = ('Ceilo data + '\ndate: '+ str(DS.year) + '-' + str(DS.month) + '-' + str(DS.day))
    
    ceilo = (DS.br.hvplot.quadmesh(cmap="viridis_r", width = 850, height = 600, title = title, 
                                   clim = (1000, 10000),  # set colorbar limits
                                   cnorm = ('log'), # choose log scale
                                   clabel = ('colorbar title'),
                                   rot = 0  # degree rotation of ticks
                                   )
             )
    
    # from: https://justinbois.github.io/bootcamp/2020/lessons/l27_holoviews.html
    # take care! may take 2...3 minutes to be ploted:
    p = hv.Points(data=df,
                  kdims=['Datetime', 'PTU'],
                  ).opts(#alpha=0.7, 
                        color='red',
                        size=1,
                        ylim=(0, 5000))
    
    # add PTU line plot to quadmesh plot using * which overlays the line on the plot
    ceilo * p