Search code examples
pandasdata-sciencedata-analysis

Functionality of iloc and simply [ ] in a series


I think in pandas a series S S[0:2] is equivalent to s.iloc[0:2] , in both cases two rows will be there but recently I got into a trouble The first picture shows the expected output but I didn't know what went wrong in the In this picture S[0] is showing error i don't know why


Solution

  • I can try to explain this behavior a bit. In Pandas, you have selection by position or by label, and it's important to remember that every single row/column will always have a position "name" and a label name. In the case of columns, this distinction is often easy to see, because we usually give columns string names. The difference is also obvious when you use explicitly .iloc vs .loc slicing. Finally, s[X] is indexing, which s[X:Y] is slicing, and the behaviour of the two actions is different.

    Example:

    df = pd.DataFrame({'a':[1,2,3], 'b': [3,3,4]})
    df.iloc[:,0]
    df.loc[:,'a']
    

    both will return

    0    1
    1    2
    2    3
    Name: a, dtype: int64
    

    Now, what happened in your case is that you overwrote the index names when you declared s.index = [11,12,13,14]. You can see that by inspecting the index before and after this change. Before, if you run s.index, you see that it is a RangeIndex(start=0, stop=4, step=1). After you change the index, it becomes Int64Index([11, 12, 13, 14], dtype='int64').

    Why does this matter? Because although you overrode the labels of the index, the position of each one of them remains the same as before. So, when you call

    s[0:2]
    

    you are slicing by position (this section in the documentation explains that it's equivalent to .iloc. However, when you run

    s[0]
    

    Pandas thinks you want to select by label, so it starts looking for the label 0, which doesn't exist anymore, because you overrode it. Think of the square-bracket selection in the context of selecting a dataframe column: you would say df["column"] (so you're asking for the column by label), so the same is in the context of a series.

    In summary, what happens with Series indexing is the following:

    • In the case you use string index labels, and you index by an string, Pandas will look up the string label.
    • In the case you use string index labels, and you index by an integer, Pandas will fall back to indexing by position (that's why your example in the comment works).
    • In the case you use integer index labels, and you index by an integer, Pandas will always try to index by the "label" (that's why the first case doesn't work, as you have overriden the label 0).

    Here are two articles explaining this bizarre behavior:

    Indexing Best Practices in Pandas.series

    Retrieving values in a Series by label or position