Search code examples
pandasdataframepandas-groupbypandas-apply

Pandas Groupby and Apply


I am performing a grouby and apply over a dataframe that is returning some strange results, I am using pandas 1.3.1

Here is the code:

ddf = pd.DataFrame({
    "id": [1,1,1,1,2]
})

def do_something(df):
    return "x"

ddf["title"] = ddf.groupby("id").apply(do_something)
ddf

I would expect every row in the title column to be assigned the value "x" but when this happens I get this data:

        id title
0        1   NaN
1        1     x
2        1     x
3        1   NaN
4        2   NaN

Is this expected?


Solution

  • The result is not strange, it's the right behavior: apply returns a value for the group, here 1 and 2 which becomes the index of the aggregation:

    >>> list(ddf.groupby("id"))
    [(1,        # the group name (the future index of the grouped df)
         id     # the subset dataframe of the group 2
      0   1
      1   1
      2   1
      3   1),
     (2,        # the group name (the future index of the grouped df)
         id     # the subset dataframe of the group 2
      4   2)]
    

    Why I have a result? Because the label of the group is found in the same of your dataframe index:

    >>> ddf.groupby("id").apply(do_something)
    id
    1    x
    2    x
    dtype: object
    

    Now change the id like this:

    ddf['id'] += 10
    #    id
    # 0  11
    # 1  11
    # 2  11
    # 3  11
    # 4  12
    
    ddf["title"] = ddf.groupby("id").apply(do_something)
    #    id title
    # 0  11   NaN
    # 1  11   NaN
    # 2  11   NaN
    # 3  11   NaN
    # 4  12   NaN
    

    Or change the index:

    ddf.index += 10
    #    id
    # 10  1
    # 11  1
    # 12  1
    # 13  1
    # 14  2
    
    ddf["title"] = ddf.groupby("id").apply(do_something)
    #     id title
    # 10   1   NaN
    # 11   1   NaN
    # 12   1   NaN
    # 13   1   NaN
    # 14   2   NaN