My question is: how to efficiently sign data unique id numbers from existing id columns? For example: I have two columns [household_id], and [person_no]. I try to make a new column, the query would be: household_id + '_' + person_no.
here is a sample:
hh_id pno
682138 1
365348 1
365348 2
try to get:
unique_id
682138_1
365348_1
365348_2
and add this unique_id as a new column. I am applying Python. My data is very large. Any efficient way to do it would be great. Thanks!
You can use pandas.
Assuming your data is in a csv file, read in the data:
import pandas as pd
df = pd.read_csv('data.csv', delim_whitespace=True)
Create the new id column:
df['unique_id'] = df.hh_id.astype(str) + '_' + df.pno.astype(str)
Now df
looks like this:
hh_id pno unique_id
0 682138 1 682138_1
1 365348 1 365348_1
2 365348 2 365348_2
Write back to a csv file:
df.to_csv('out.csv', index=False)
The file content looks like this:
hh_id,pno,unique_id
682138,1,682138_1
365348,1,365348_1
365348,2,365348_2