I have df:
domain orgid
csyunshu.com 108299
dshu.com 108299
bbbdshu.com 108299
cwakwakmrg.com 121303
ckonkatsunet.com 121303
I would like to add a new column with replaces domain column with numeric ids per orgid:
domain orgid domainid
csyunshu.com 108299 1
dshu.com 108299 2
bbbdshu.com 108299 3
cwakwakmrg.com 121303 1
ckonkatsunet.com 121303 2
I have already tried this line but it does not give the result I want:
df.groupby('orgid').count['domain'].reset_index()
Can anybody help?
You can call rank
on the groupby
object and pass param method='first'
:
In [61]:
df['domainId'] = df.groupby('orgid')['orgid'].rank(method='first')
df
Out[61]:
domain orgid domainId
0 csyunshu.com 108299 1
1 dshu.com 108299 2
2 bbbdshu.com 108299 3
3 cwakwakmrg.com 121303 1
4 ckonkatsunet.com 121303 2
If you want to overwrite the column you can do:
df['domain'] = df.groupby('orgid')['orgid'].rank(method='first')