I have a column in a dataframe that includes rows like the bellow string:
parameter1: 104 parameter2: 98 % parameter3: 48 % parameter4: 1005 km parameter5: 1
How can I split it and create new columns from it?
So far, I am doing this way but it is not so efficient.
#split and create new fields
merged_df[['parameter1:n', 'parameter1:', 'parameter1:n', 'parameter2:(%)', 'ddd%', 'parameter3:n', 'parameter3(%)', 'dd%', 'parameter4:n', 'parameter4(km):', 'ffkm', 'parameter5:n', 'parameter5']] = merged_df['vals'].str.split(' ',expand=True)
#drop useful field
merged_df = merged_df.drop(['parameter1:n', 'parameter1:n', 'ddd%', 'parameter3:n', 'dd%','parameter4:n','ffkm','parameter5:n'], axis=1)
the result is like the table bellow
+------------+---------------+---------------+----------------+------------+
| parameter1 | parameter2(%) | parameter3(%) | parameter4(km) | parameter5 |
+------------+---------------+---------------+----------------+------------+
| 104| 98 | 48 | 1005 | 1 |
| 34 | 9 | 87 | 44 | 33432 |
| 4 | 343 | 6 | 69 | 31 |
+------------+---------------+---------------+----------------+------------+
Thank you in advance
Try following code:
import re
df['vals'] = df['vals'].astype(str)
df['vals'] = df['vals'].apply(lambda rows: [re.search('\d+',i).group() for i in re.split(r'[^\s]+\:',rows)[1:]])
cols =["parameter1","parameter2(%)", "parameter3(%)", "parameter4(km)", "parameter5"]
df[cols] = pd.DataFrame(df.vals.tolist())
df.drop(["vals"],axis=1,inplace=True)
NOTE: Assuming you have all values of string datatype in "vals" column.
Code that I used to make the input dataframe is following:
rows = ["parameter1: 104 parameter2: 98 % parameter3: 48 % parameter4: 1005 km parameter5: 1"]*10
df = pd.DataFrame(rows,columns=['vals'])