I have a dataframe consisting of a column of strings. I want to extract the place, date, and race number from one column.
Dataframe:
- S.no. FileName
- 0. Albany17NOV19sectionalRace1.csv
- 1. Albany22NOV19sectionalRace4.csv
- 2. New York1NOV19sectionalRace7.csv
- 3. Aquis Park Gold Coast27NOV19sectionalRace2.csv
Desired dataframe:
- S.no. Place Date Racenumber
- 0. Albany 17NOV19 Race1
- 1. Albany 22NOV19 Race4
- 2. New York 1NOV19 Race7
- 3. Aquis park Gold Coast 27NOV19 Race2
Split by;
digit
followed by Nondigit
and digit
that is say 17NOV19
or
sectional
or
3 special character .
After split drop all rows that have None as values and any others not wanted. Can rename columns if you needed to
df=df.FileName.str.split('(\d+\D+\d+)|(sectional)|(\.)', expand=True).dropna(1).drop(columns=[4,6,11,12])
print(df)
0 1 8
0 Albany 17NOV19 Race1
1 Albany 22NOV19 Race4
2 New York 1NOV19 Race7
3 Aquis Park Gold Coast 27NOV19 Race2