Search code examples
pythonpandasunique-id

Create unique ID from the existing two columns, python


My question is: how to efficiently sign data unique id numbers from existing id columns? For example: I have two columns [household_id], and [person_no]. I try to make a new column, the query would be: household_id + '_' + person_no.

here is a sample:

hh_id       pno  
 682138    1   
 365348    1     
 365348    2

try to get:

unique_id
682138_1
365348_1
365348_2

and add this unique_id as a new column. I am applying Python. My data is very large. Any efficient way to do it would be great. Thanks!


Solution

  • You can use pandas.

    Assuming your data is in a csv file, read in the data:

    import pandas as pd 
    
    df = pd.read_csv('data.csv', delim_whitespace=True)
    

    Create the new id column:

    df['unique_id'] = df.hh_id.astype(str) + '_' + df.pno.astype(str)
    

    Now df looks like this:

        hh_id  pno unique_id
    0  682138    1  682138_1
    1  365348    1  365348_1
    2  365348    2  365348_2
    

    Write back to a csv file:

    df.to_csv('out.csv', index=False)
    

    The file content looks like this:

    hh_id,pno,unique_id
    682138,1,682138_1
    365348,1,365348_1
    365348,2,365348_2