I am performing a grouby and apply over a dataframe that is returning some strange results, I am using pandas 1.3.1
Here is the code:
ddf = pd.DataFrame({
"id": [1,1,1,1,2]
})
def do_something(df):
return "x"
ddf["title"] = ddf.groupby("id").apply(do_something)
ddf
I would expect every row in the title
column to be assigned the value "x" but when this happens I get this data:
id title
0 1 NaN
1 1 x
2 1 x
3 1 NaN
4 2 NaN
Is this expected?
The result is not strange, it's the right behavior: apply
returns a value for the group, here 1 and 2 which becomes the index of the aggregation:
>>> list(ddf.groupby("id"))
[(1, # the group name (the future index of the grouped df)
id # the subset dataframe of the group 2
0 1
1 1
2 1
3 1),
(2, # the group name (the future index of the grouped df)
id # the subset dataframe of the group 2
4 2)]
Why I have a result? Because the label of the group is found in the same of your dataframe index:
>>> ddf.groupby("id").apply(do_something)
id
1 x
2 x
dtype: object
Now change the id
like this:
ddf['id'] += 10
# id
# 0 11
# 1 11
# 2 11
# 3 11
# 4 12
ddf["title"] = ddf.groupby("id").apply(do_something)
# id title
# 0 11 NaN
# 1 11 NaN
# 2 11 NaN
# 3 11 NaN
# 4 12 NaN
Or change the index
:
ddf.index += 10
# id
# 10 1
# 11 1
# 12 1
# 13 1
# 14 2
ddf["title"] = ddf.groupby("id").apply(do_something)
# id title
# 10 1 NaN
# 11 1 NaN
# 12 1 NaN
# 13 1 NaN
# 14 2 NaN