Search code examples
pythoncsvmatplotlibplotgenfromtxt

Matplotlib/Genfromtxt: Multiple plots against time, skipping missing data points, from .csv


I've been able to import and plot multiple columns of data against the same x axis (time) with legends, from csv files using genfromtxt as shown in this link:

Matplotlib: Import and plot multiple time series with legends direct from .csv

The above simple example works fine if all cells in the csv file contain data. However some of my cells have missing data, and some of the parameters (columns) only include data points every e.g. second or third time increment.

I want to plot all the parameters on the same time axis as previously; and if one or more data points in a column are missing, I want the plot function to skip the missing data points for that parameter and only draw lines between the points that are available for that parameter.

Further, I'm trying to find a generic solution which will automatically plot in the above style directly from the csv file for any number of columns, time points, missing data points etc., when these are not known in advance.

I've tried using the genfromtxt options missing_values and filling_values, as shown in my non-working example below; however I want to skip the missing data points rather than assign them the value '0'; and in any case with this approach I seem to get "ValueError: could not convert string to float" when missing data points are encountered.

Plotting multiple parameters against time on the same plot, whilst dealing with occasional or regularly skipped values must be a pretty common problem for the scientific community.

I'd be very grateful for any suggestions for an elegant solution using genfromtxt.

Non-working code and demo data below. Many thanks in anticipation.

Demo data: 'Data.csv':
Time,Parameter_1,Parameter_2,Parameter_3
0,10,12,11
1,20,,
2,25,23,
3,30,,30

import numpy as np
import matplotlib.pyplot as plt

arr = np.genfromtxt('DemoData.csv', delimiter=',', dtype=None, missing_values='', filling_values = 0)
names = (arr[0])
for n in range (1,len(names)):
    plt.plot (arr[1:,0],arr[1:,n],label=names[n])
plt.legend()    
plt.show()

Solution

  • I think if you set usemask =True in your genfromtxt command, it will do what you want. Probably don't want filling_values set either

    arr = np.genfromtxt('DemoData.csv', delimiter=',', dtype=None, missing_values='',  usemask=True)
    

    you can then plot using something like this:

    for n in range (1,len(names)):
        plot(arr[1:,0][logical_not(arr[1:,n].mask)], arr[1:,n].compressed())