Here is my original data: enter image description here
As you can see.The cust_id column records the consumption record for each ID.And second column means the product name,third is the munber they bought each time.
I want to get this kind of data: enter image description here
The result data shows each customer bought which product and how many.If they never bought,then the data is None.I think this is Sparse matrix.
I have tried many ways and still can't fix it up.....
Maybe pandas?Numpy?
There is problem with duplicates, I add last row with same cust_id
and prd_id
values for demonstrate it.
print (df)
cust_id prd_id prd_number
8 462 40 1
9 462 46 3
10 462 59 1
11 462 63 13
12 462 67 1
13 462 82 12
14 462 88 1
15 462 163 3
16 463 68 1
17 463 90 1
18 463 159 2
16 464 93 11
20 464 94 8
21 464 96 1
22 464 142 4
23 465 50 1
24 465 50 5
Then need groupby
by columns cust_id
and prd_id
with aggreagting some function like mean()
or sum()
. Last unstack
with replacing NaN
to 0
:
print (df.groupby(['cust_id', 'prd_id'])['prd_number'].sum().unstack(fill_value=0))
prd_id 40 46 50 59 63 67 68 82 88 90 93 94 96 142 \
cust_id
462 1 3 0 1 13 1 0 12 1 0 0 0 0 0
463 0 0 0 0 0 0 1 0 0 1 0 0 0 0
464 0 0 0 0 0 0 0 0 0 0 11 8 1 4
465 0 0 6 0 0 0 0 0 0 0 0 0 0 0
prd_id 159 163
cust_id
462 0 3
463 2 0
464 0 0
465 0 0