Search code examples

pandas: read_csv combined date-time columns as index into a dataframe

I have a csv file which contains date and time stamps as two of the columns. I am using pandas read_csv to read the contents into a dataframe. My ultimate goal is to plot time series graphs from the data.

!head vmstat.csv

df = read_csv("vmstat.csv", parse_dates=[['date','time']])
f = DataFrame(df, columns=[ 'date_time',  'user_time', 'sys_time', 'wait_io_time'])

In [3]: f
date_time               user_time  sys_time     wait_io_time
0  2012-11-01 08:59:27          3         1             0
1  2012-11-01 08:59:32          0         0             0
2  2012-11-01 08:59:37         20         2             1
3  2012-11-01 08:59:42          0         0             0
4  2012-11-01 08:59:47          0         0             0

So far, we could read the data correctly and date_time is combined in the DataFrame. There are issues if I try to used the date_time from df as index. Specifying index = df.date_time gives all NaN values:

dindex = f['date_time']
print dindex
g = DataFrame(f, columns=[ 'user_time', 'sys_time', 'wait_io_time'], index=dindex)

In [7]: g
0    2012-11-01 08:59:27
1    2012-11-01 08:59:32
2    2012-11-01 08:59:37
3    2012-11-01 08:59:42
4    2012-11-01 08:59:47
Name: date_time  <---- dindex
                 user_time  sys_time  wait_io_time
2012-11-01 08:59:27        NaN       NaN           NaN
2012-11-01 08:59:32        NaN       NaN           NaN
2012-11-01 08:59:37        NaN       NaN           NaN
2012-11-01 08:59:42        NaN       NaN           NaN
2012-11-01 08:59:47        NaN       NaN           NaN

As you see, the column values are coming out as all NaNs. How do I get correct values as in the intermediate f frame?


  • You want to use set_index:

    df1 = df.set_index('date_time')

    Which selects the column 'date_time' as an index for the new DataFrame.


    Note: The behaviour you are coming across in the DataFrame constructor is demonstrated as follows:

    df = pd.DataFrame([[1,2],[3,4]])
    df1 = pd.DataFrame(df, index=[1,2])
    In [3]: df1
        0   1
    1   3   4
    2 NaN NaN