I try to analyze data with the help of pivot tables using Pandas.
For multiindex everything works fine.
I have this pivot table:
>>>print(pivtab)
count win
pp ps
8000 8200 4 2
8100 8200 2 1
8300 3 1
Now I can select for example the value in the 2nd row in the column "count" with
>>>print(pivtab["count"][1])
2
The unexpected behaviour comes when using just a single index pivot table. The pivot table looks like this:
count win
pp
8000 4 2
8100 5 2
Now when I try to use the same way with print(pivtab["count"][1])
I get an KeyError: 1
. It seems as if Python now doesn't accept the row number but the row name (=index). With
>>>print(pivtab["count“][8100])
5
everything works fine.
The complete code looks like this:
import pandas as pd
columns = ["count","game_Id","season","win","pp","ps"]
matrix = pd.DataFrame(columns=columns)
# Create a test matrix
for i in range(1,10):
win = 0
if int(i/2) == i/2:
win = 1
pprim = 8000
if i > 4:
pprim = 8100
psub = 8200
if i > 6:
psub = 8300
new_row = pd.DataFrame([[1, i, 11, win, pprim, psub]], columns=columns)
matrix = matrix.append(new_row,ignore_index=True)
print(matrix)
pivtab = pd.pivot_table(matrix, index=["pp","ps"], values=["count","win"], aggfunc="sum")
print("\n", pivtab)
print(pivtab["count"][1])
To be completely confused I tried another test matrix... and here everything works fine!
import pandas as pd
matrix = pd.DataFrame({"A":["hey","hey","boo","boo"], "B":[1,2,3,2], "valueA":[123,441,190,123], "valueB":[5,5,6,6]})
piv_matrix = pd.pivot_table(matrix, index=["A"], values=["valueA","valueB"], aggfunc="sum")
print(piv_matrix)
print("\nValue:", piv_matrix["valueA"][1])
Result:
valueA valueB
A
boo 313 12
hey 564 10
Value: 564
I would be happy if you could explain to me the reason for this unexpected (at least for me) behaviour. Thanks!
Discouraged from Documentation
The documentation discourages chained indexing. The syntax df[label_1][label_2]
should not be used.
Alternatives
Most solutions involve loc
/ iloc
for label / position-based slicing, or at
/ iat
for accessing scalars.
In your examples, you are mixing label-based and position-based indexers. Instead, so you can select a series and use iat
or iloc
:
piv_matrix['valueA'].iat[1]
piv_matrix['valueA'].iloc[1]
This syntax will work for both your examples.