Search code examples
pandasparametersnested-listsoutofrangeexception

Why does using df.iloc[[i][0]] in Pandas' iloc() lead to specific behavior?


I am just starting to learn Pandas, and in a piece of code, there is a call to df.iloc[[1][0]] (where df is a pd.DataFrame with a shape of (60935, 54)). From the context of the code, df.iloc[[1][0]] seems to represent a row of df. However, how should one interpret [[1][0]]? Why does iloc[] allow two adjacent lists as parameters? How does iloc[] handle this parameters internally? This clearly is not indexing both rows and columns. Additionally, I noticed that when the second number is neither 0 nor -1, an index out-of-range error occurs. Why is this?

Here are some experiments I conducted:

mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
          {'a': 100, 'b': 200, 'c': 300, 'd': 400},
          {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000}]
df = pd.DataFrame(mydict)
print(df.iloc[[0][-1]].shape)  # Outputs (4,)
print(df.iloc[[0][0]].shape)  # Outputs (4,)
print(df.iloc[[0]].shape)     # Outputs (1, 4)
print(df.iloc[[0][1]].shape)  # Raises IndexError: list index out of range
print(type(df.iloc[[0]]))     # Outputs <class 'pandas.core.frame.DataFrame'>
print(type(df.iloc[[0][0]]))  # Outputs <class 'pandas.core.series.Series'>

Solution

  • I think this is a bit confusing programming style. Let me break it down for you.

    [1] creates a list with one element (namely the number 1).

    [1][0] then accesses the first (or 0th) element of said list, thus returning 1.

    Thus, df.iloc[[1][0]] is equivalent to df.iloc[1].

    And similarly for the remaining indexes. The -1 returns the first item from the back of the given list. Since the list is just one element long, it will return the first element again.

    df.iloc[[0]] is requesting a list of rows (but just one row, namely the 0th element). This will result in a dataframe.

    If instead, you were calling df.iloc[0], you would be requesting exactly one element and not a list, leading to a pd.Series being returned.

    Alternatively, you could also request something like df.iloc[[0:2]], which would return the first two rows (and thus a pd.DataFrame again.