I'm currently building a recommender system using Goodreads data.
I want to change string user ids into integers.
Current user ids are like this: '0d688fe079530ee1fe6fa85eab10ec5c'
I want to change it into integers(e.g. 1
, 2
, 3
, ...), to have the same integer ids which share the same string ids. I've considered using function df.groupby('user_id')
, but I couldn't figure out how to do this.
I would be very thankful if anybody let me know how to change.
Use pd.factorize
as suggested by @AsishM.
Input data:
user_id book_id ratings
0 831a1e2505e44a2f81e670db82c9a3c0 1942 3
1 58d3869488a648aebef32b6c2ec4fb16 3116 5
2 f05ad4c0978c4d0eb3ca41921f7a80af 3558 4
3 511c8f47d75c427eae8bead7ff80307b 2467 3
4 db74d6df03644e61b4cd830db35de6a8 2318 2
5 58d3869488a648aebef32b6c2ec4fb16 5882 4
6 db74d6df03644e61b4cd830db35de6a8 6318 5
df['uid'] = pd.factorize(df['user_id'])[0]
Output result:
user_id book_id ratings uid
0 831a1e2505e44a2f81e670db82c9a3c0 1942 3 0
1 58d3869488a648aebef32b6c2ec4fb16 3116 5 1 # user 1
2 f05ad4c0978c4d0eb3ca41921f7a80af 3558 4 2
3 511c8f47d75c427eae8bead7ff80307b 2467 3 3
4 db74d6df03644e61b4cd830db35de6a8 2318 2 4 # user 4
5 58d3869488a648aebef32b6c2ec4fb16 5882 4 1 # user 1
6 db74d6df03644e61b4cd830db35de6a8 6318 5 4 # user 4