I'm generating an empty dataframe with a series of dates as the index. Data will be added to the dataframe at a later point.
cbd=pd.date_range(start=pd.datetime(2017,01,02),end=pd.datetime(2017,01,30),period=1)
df = pd.DataFrame(data=None,columns=['Test1','Test2'],index=cbd)
df.head()
Test1 Test2
2017-01-02 NaN NaN
2017-01-03 NaN NaN
2017-01-04 NaN NaN
2017-01-05 NaN NaN
2017-01-06 NaN NaN
A few slicing methods don't seem to work. The following returns a KeyError:
df['2017-01-02']
However any of the following work:
df['2017-01-02':'2017-01-02']
df.loc['2017-01-02']
What am I missing here? Why doesn't the first slice return a result?
[]
in df[]
:
inside []
, then the value(s) inside it will be considered as column(s).:
inside []
, then the value(s) inside it will be considered as row(s).Because most of the time people want to slice the rows instead of slicing the columns.
So they decided that x
and y
in df[x:y]
should correspond to rows,
and x
in d[x]
or x
, y
in df[[x,y]]
should correspond to column(s).
df = pd.DataFrame(data = [[1,2,3], [1,2,3], [1,2,3]],
index = ['A','B','C'], columns = ['A','B','C'])
print df
Output:
A B C
A 1 2 3
B 1 2 3
C 1 2 3
Now when you do df['B']
, it can mean 2 things:
Take the 2nd index B
and give you the 2nd row 1 2 3
OR
Take the 2nd column B
and give you the 2nd column 2 2 2
.
So in order to resolve this conflict and keep it unambiguous df['B']
will always mean that you want the column 'B'
, if there is no such column then it will throw an Error.
df['2017-01-02']
fails?It will search for a column '2017-01-02'
, Because there is no such column, it throws an error.
df.loc['2017-01-02']
works then?Because .loc[]
has syntax of df.loc[row,column]
and you can leave out the column if you will, as in your case, it simply means df.loc[row]