How to concat two columns in a dataframe so that the data goes in the order in which they went in the column and also remove unnecessary characters so that only numeric characters remain.
For example i have dataframe:
df = pd.DataFrame(data = {
'id_part1': ['1221002', 'Johny Cash', '2-12-0032', 2112],
'id_part2': ['', '345000223', '', 4322],
'created_date': ['2019-01-01', '15/02/2016', '2013-05-07', '2020-08-29']})
I need to get the following result:
One option is to create a new id by removing any non-digit characters in both columns before joining.
df['correct_id'] = df[['id_part1', 'id_part2']].astype(str).apply(lambda x: int(''.join(x.str.replace(r'\D', '', regex=True))), axis=1)
print(df[['correct_id', 'created_date']])
correct_id created_date
0 1221002 2019-01-01
1 345000223 15/02/2016
2 2120032 2013-05-07
3 21124322 2020-08-29
Also consider formating your date column
df['created_date'] = pd.to_datetime(df['created_date'], format='mixed')
print(df[['correct_id', 'created_date']])
correct_id created_date
0 1221002 2019-01-01
1 345000223 2016-02-15
2 2120032 2013-05-07
3 21124322 2020-08-29