I am trying to calculate the frequency of items sold for each client of my dataset BUT I don't want to calculate the frequency on the length of the whole dataset but on the total number of purchased items per client.
My dataframe would look like this:
data = {'ClientId': ['1','2','3','4','2','2','1','4'],
'QuantitySold': ['5','10','6','7','5','10','8','7']
}
Expected output:
Client Id QuantitySold FrequencySold
1 5 0.385
2 10 0.4
3 6 1
4 7 0.5
2 5 0.2
2 10 0.4
1 8 0.615
4 7 0.5
Calculation explained: for client 1 = 5/(5+8)= 0.385
How can I do that using Python?
First, create a dictionary with the totals for each client, then just divide the current quantity by those totals:
import collections
totals = collections.defaultdict(int)
for c, q in zip(data["ClientId"], data["QuantitySold"]):
totals[c] += int(q)
# defaultdict(int, {'1': 13, '2': 25, '3': 6, '4': 14})
for c, q in zip(data["ClientId"], data["QuantitySold"]):
print(c, q, int(q)/totals[c])
Output:
1 5 0.38461538461538464
2 10 0.4
3 6 1.0
4 7 0.5
2 5 0.2
2 10 0.4
1 8 0.6153846153846154
4 7 0.5