Search code examples
pythonpandasdataframe

two columns concat in pandas dataframe


How to concat two columns in a dataframe so that the data goes in the order in which they went in the column and also remove unnecessary characters so that only numeric characters remain.

For example i have dataframe:

df = pd.DataFrame(data = {
    'id_part1': ['1221002', 'Johny Cash', '2-12-0032', 2112],
    'id_part2': ['', '345000223', '', 4322],
    'created_date': ['2019-01-01', '15/02/2016', '2013-05-07', '2020-08-29']})

dataframe

I need to get the following result:

result


Solution

  • One option is to create a new id by removing any non-digit characters in both columns before joining.

    df['correct_id'] = df[['id_part1', 'id_part2']].astype(str).apply(lambda x: int(''.join(x.str.replace(r'\D', '', regex=True))), axis=1)
    
    print(df[['correct_id', 'created_date']])
        correct_id  created_date
    0      1221002    2019-01-01
    1    345000223    15/02/2016
    2      2120032    2013-05-07
    3     21124322    2020-08-29
    

    Also consider formating your date column

    df['created_date'] = pd.to_datetime(df['created_date'], format='mixed')
    
    print(df[['correct_id', 'created_date']])
        correct_id  created_date
    0      1221002    2019-01-01
    1    345000223    2016-02-15
    2      2120032    2013-05-07
    3     21124322    2020-08-29