I am trying to fill the missing points in the time series data in the ascii file with pandas. I think that other things are okay, but the first line was filled with nan even though there was data originally. My data sample is:
"2011-08-26 00:00:00",1155179,3.232,23.7,3.281,0.386,25.27,111.5665,28.92,29.83,19.13,0,111.5,13.02,29.77,345.7
"2011-08-26 00:00:30",1155180,3.289,20.44,2.153,0.222,25.25,111.5735,28.94,29.82,19.53,0,111.5,13.02,29.79,342.4
.
.
"2011-08-26 23:59:30",1155297,12.62,28.06,3.162,1.356,24.3,111.4614,28.65,29.84,19.53,0,111.4,13.06,29.50,350.1
And I used the code as below:
t1 = np.genfromtxt(INPUT,dtype=None,delimiter=',',usecols=[0])
start = t1[0].strip('\'"')
end = t1[-1].strip('\'"')
data=pd.read_csv(INPUT,sep=',',index_col=[0],parse_dates=[0])
index = pd.date_range(start,end,freq="30S")
df = data
sk_f = df.reindex(index)
So with this code, I wanted to read the first and the last string of the first column and made them to the index to fill the possible missing points indicated as nan. However, the problem is that first column was also filled in the results as below:
2011-08-26 00:00:00,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan
2011-08-26 00:00:30,1155180,3.289,20.44,2.153,0.222,25.25,111.5735,28.94,29.82,19.53,0,111.5,13.02,29.79,342.4
.
.
2011-08-26 23:59:30,1155297,12.62,28.06,3.162,1.356,24.3,111.4614,28.65,29.84,19.53,0,111.4,13.06,29.50,350.1
It means that the first line was unexpectedly filled eventhough there are data in the original file. From the second line, every thing is okay and filling missing data also seems okay. I am trying to find why it happens. To be honest, I couldn't find the reason yet. Any idea or help would be really appreciated. Thank you, Isaac
I think you can omit read file by genfromtxt
and try only read_csv
, then found min
and max
dates for reindex
method.
Or use resample
:
import pandas as pd
import numpy as np
import io
temp=u""""2011-08-26 00:00:00",1155179,3.232,23.7,3.281,0.386,25.27,111.5665,28.92,29.83,19.13,0,111.5,13.02,29.77,345.7
"2011-08-26 00:00:30",1155180,3.289,20.44,2.153,0.222,25.25,111.5735,28.94,29.82,19.53,0,111.5,13.02,29.79,342.4
"2011-08-26 23:59:30",1155297,12.62,28.06,3.162,1.356,24.3,111.4614,28.65,29.84,19.53,0,111.4,13.06,29.50,350.1"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep=",", index_col=[0], parse_dates=[0], header=None)
print df
1 2 3 4 5 6 7 \
0
2011-08-26 00:00:00 1155179 3.232 23.70 3.281 0.386 25.27 111.5665
2011-08-26 00:00:30 1155180 3.289 20.44 2.153 0.222 25.25 111.5735
2011-08-26 23:59:30 1155297 12.620 28.06 3.162 1.356 24.30 111.4614
8 9 10 11 12 13 14 15
0
2011-08-26 00:00:00 28.92 29.83 19.13 0 111.5 13.02 29.77 345.7
2011-08-26 00:00:30 28.94 29.82 19.53 0 111.5 13.02 29.79 342.4
2011-08-26 23:59:30 28.65 29.84 19.53 0 111.4 13.06 29.50 350.1
start = df.index.min()
end = df.index.max()
print start
2011-08-26 00:00:00
print end
2011-08-26 23:59:30
index = pd.date_range(start,end,freq="30S")
sk_f = df.reindex(index)
print sk_f.head()
1 2 3 4 5 6 7 \
2011-08-26 00:00:00 1155179 3.232 23.70 3.281 0.386 25.27 111.5665
2011-08-26 00:00:30 1155180 3.289 20.44 2.153 0.222 25.25 111.5735
2011-08-26 00:01:00 NaN NaN NaN NaN NaN NaN NaN
2011-08-26 00:01:30 NaN NaN NaN NaN NaN NaN NaN
2011-08-26 00:02:00 NaN NaN NaN NaN NaN NaN NaN
8 9 10 11 12 13 14 15
2011-08-26 00:00:00 28.92 29.83 19.13 0 111.5 13.02 29.77 345.7
2011-08-26 00:00:30 28.94 29.82 19.53 0 111.5 13.02 29.79 342.4
2011-08-26 00:01:00 NaN NaN NaN NaN NaN NaN NaN NaN
2011-08-26 00:01:30 NaN NaN NaN NaN NaN NaN NaN NaN
2011-08-26 00:02:00 NaN NaN NaN NaN NaN NaN NaN NaN
print df.resample('30S', fill_method='ffill').head()
1 2 3 4 5 6 7 \
0
2011-08-26 00:00:00 1155179 3.232 23.70 3.281 0.386 25.27 111.5665
2011-08-26 00:00:30 1155180 3.289 20.44 2.153 0.222 25.25 111.5735
2011-08-26 00:01:00 1155180 3.289 20.44 2.153 0.222 25.25 111.5735
2011-08-26 00:01:30 1155180 3.289 20.44 2.153 0.222 25.25 111.5735
2011-08-26 00:02:00 1155180 3.289 20.44 2.153 0.222 25.25 111.5735
8 9 10 11 12 13 14 15
0
2011-08-26 00:00:00 28.92 29.83 19.13 0 111.5 13.02 29.77 345.7
2011-08-26 00:00:30 28.94 29.82 19.53 0 111.5 13.02 29.79 342.4
2011-08-26 00:01:00 28.94 29.82 19.53 0 111.5 13.02 29.79 342.4
2011-08-26 00:01:30 28.94 29.82 19.53 0 111.5 13.02 29.79 342.4
2011-08-26 00:02:00 28.94 29.82 19.53 0 111.5 13.02 29.79 342.4