Search code examples
pythonpandasapply

Pandas: too many indexers: using iloc in apply


I am working with the following dataframe and want to generate columns with 'grandchildren'. I wrote the function find_grandchild to extract the 'grandchildren' and tried to run it for the last column of every row using iloc via apply, but got the error 'too many indexers'. When I apply it to the same column using the column name in apply, I get the desired result.

data = {'Parent':['plant','plant','plant','cactus','algae','tropical plant','cactus','monstrera','blue_cactus','light_blue_cactus'],
       'Child': ['cactus','algae','tropical_plant','aloe_vera','green_algae','monstrera','blue_cactus','monkey_monstrera','light_blue_cactus','desert_blue_cactus_lightblue']}

df = pd.DataFrame(data)
df

def find_grandchild_list(row):
    grandchild_value = df.iloc[:,0] == row
    return [df[grandchild_value].iloc[:,-1]]

df

I want my final dataframe to look like this:

plant | cactus | aloe vera
plant | cactus | blue cactus | light blue cactus | desert_blue_cactus_lightblue
plant | algea | green_algea
plant | tropical_plant | monstrera | monkey monstrera

successful:

df.apply(lambda row : find_grandchild_list(row['Child']), axis=1)

error:

df.apply(lambda row : find_grandchild_list(row.iloc[:,-1]), axis=1)

For my final script, I cannot use column name, because I want to use apply repeatedly and always run on the last column. My error is probably due to a poor understanding of iloc, but I couldn't find documentation on iloc in the context of apply.


Solution

  • You are applying your lambda function to each row by specifying axis=1. [Ref]

    Therefore, row in your lambda function is a pd.Series of df.iloc[0], df.iloc[1], and others.

    df.apply(lambda row : print(type(row)), axis=1)
    >>>
    <class 'pandas.core.series.Series'>
    <class 'pandas.core.series.Series'>
    <class 'pandas.core.series.Series'>
    <class 'pandas.core.series.Series'>
    <class 'pandas.core.series.Series'>
    <class 'pandas.core.series.Series'>
    <class 'pandas.core.series.Series'>
    <class 'pandas.core.series.Series'>
    <class 'pandas.core.series.Series'>
    <class 'pandas.core.series.Series'>
    

    Because pd.Series has a 1-dimensional index, you can use row['child'], row[1], or row.iloc[1].