Search code examples
pythonpython-3.xpandasindexingmulti-index

Pandas: selecting values of array with [columnname][rownumber] just works for multiindex


I try to analyze data with the help of pivot tables using Pandas.

For multiindex everything works fine.

I have this pivot table:

>>>print(pivtab)
           count  win
pp   ps              
8000 8200      4    2
8100 8200      2    1
     8300      3    1

Now I can select for example the value in the 2nd row in the column "count" with

>>>print(pivtab["count"][1])
2

The unexpected behaviour comes when using just a single index pivot table. The pivot table looks like this:

      count  win
pp              
8000      4    2
8100      5    2 

Now when I try to use the same way with print(pivtab["count"][1]) I get an KeyError: 1. It seems as if Python now doesn't accept the row number but the row name (=index). With

>>>print(pivtab["count“][8100])
5

everything works fine.

The complete code looks like this:

import pandas as pd
columns = ["count","game_Id","season","win","pp","ps"]
matrix = pd.DataFrame(columns=columns)
# Create a test matrix
for i in range(1,10):
    win = 0
    if int(i/2) == i/2:
        win = 1
    pprim = 8000
    if i > 4:
        pprim = 8100
    psub = 8200
    if i > 6:
        psub = 8300
    new_row = pd.DataFrame([[1, i, 11, win, pprim, psub]], columns=columns)
    matrix = matrix.append(new_row,ignore_index=True)
print(matrix)
pivtab = pd.pivot_table(matrix, index=["pp","ps"], values=["count","win"], aggfunc="sum")
print("\n", pivtab)
print(pivtab["count"][1])

To be completely confused I tried another test matrix... and here everything works fine!

import pandas as pd
matrix = pd.DataFrame({"A":["hey","hey","boo","boo"], "B":[1,2,3,2], "valueA":[123,441,190,123], "valueB":[5,5,6,6]})
piv_matrix = pd.pivot_table(matrix, index=["A"], values=["valueA","valueB"], aggfunc="sum")
print(piv_matrix)
print("\nValue:", piv_matrix["valueA"][1])

Result:

     valueA  valueB
A                  
boo     313      12
hey     564      10

Value: 564

I would be happy if you could explain to me the reason for this unexpected (at least for me) behaviour. Thanks!


Solution

  • Discouraged from Documentation

    The documentation discourages chained indexing. The syntax df[label_1][label_2] should not be used.

    Alternatives

    Most solutions involve loc / iloc for label / position-based slicing, or at / iat for accessing scalars.

    In your examples, you are mixing label-based and position-based indexers. Instead, so you can select a series and use iat or iloc:

    piv_matrix['valueA'].iat[1]
    piv_matrix['valueA'].iloc[1]
    

    This syntax will work for both your examples.