I have a data frame (df) with these columns: user, vector, and group.
df = pd.DataFrame({'user': ['user_1', 'user_2', 'user_3', 'user_4', 'user_5', 'user_6'], 'vector': [[1, 0, 2, 0], [1, 8, 0, 2],[6, 2, 0, 0], [5, 0, 2, 2], [3, 8, 0, 0],[6, 0, 0, 2]], 'group': ['A', 'B', 'C', 'B', 'A', 'A']})
I want to calculate aggregated variance for each group.
I tried this code, but it return an error
aggregated_variance = (df.groupby('group', as_index=False)['vector'].agg(["var"]))
ValueError: no results
You can use .explode
to clean up your data and then perform a .groupby
operation:
out = (
df.explode('vector')
.groupby('group')['vector'].var(ddof=1)
)
print(out)
group
A 7.060606
B 7.428571
C 8.000000
Name: vector, dtype: float64
The trick here lies in the use of .explode
:
>>> df.head()
user vector group
0 user_1 [1, 0, 2, 0] A
1 user_2 [1, 8, 0, 2] B
2 user_3 [6, 2, 0, 0] C
3 user_4 [5, 0, 2, 2] B
4 user_5 [3, 8, 0, 0] A
>>> df.explode('vector').head()
user vector group
0 user_1 1 A
0 user_1 0 A
0 user_1 2 A
0 user_1 0 A
1 user_2 1 B
...