I have a dataframe df
with transactions where the values in the column Col
can be repeated. I use Counter dictionary1
to count the frequency for each Col
value, then I would like to run a for loop on a subset of the data and obtain a value pit
. I want to create a new dictionary dict1
where the key is the key from dictionary1
and the value is the value of pit
. This is the code I have so far:
dictionary1 = Counter(df['Col'])
dict1 = defaultdict(int)
for i in range(len(dictionary1)):
temp = df[df['Col'] == dictionary1.keys()[i]]
b = temp['IsBuy'].sum()
n = temp['IsBuy'].count()
pit = b/n
dict1[dictionary1.keys()[i]] = pit
My question is, how can i assign the key and value for dict1
based on the key of dictionary1
and the value obtained from the calculation of pit
. In other words, what is the correct way to write the last line of code in the above script.
Thank you.
Since you're using pandas
, I should point out that the problem you're facing is common enough that there's a built-in way to do it. We call collecting "similar" data into groups and then performing operations on them a groupby
operation. It's probably wortwhile reading the tutorial section on the groupby split-apply-combine
idiom -- there are lots of neat things you can do!
The pandorable way to compute the pit
values would be something like
df.groupby("Col")["IsBuy"].mean()
For example:
>>> # make dummy data
>>> N = 10**4
>>> df = pd.DataFrame({"Col": np.random.randint(1, 10, N), "IsBuy": np.random.choice([True, False], N)})
>>> df.head()
Col IsBuy
0 3 False
1 6 True
2 6 True
3 1 True
4 5 True
>>> df.groupby("Col")["IsBuy"].mean()
Col
1 0.511709
2 0.495697
3 0.489796
4 0.510658
5 0.507491
6 0.513183
7 0.522936
8 0.488688
9 0.490498
Name: IsBuy, dtype: float64
which you could turn into a dictionary from a Series if you insisted:
>>> df.groupby("Col")["IsBuy"].mean().to_dict()
{1: 0.51170858629661753, 2: 0.49569707401032703, 3: 0.48979591836734693, 4: 0.51065801668211308, 5: 0.50749063670411987, 6: 0.51318267419962338, 7: 0.52293577981651373, 8: 0.48868778280542985, 9: 0.49049773755656106}