I am trying to use apply
function on a data frame to remove strings from the date columns. For instance, I have a data frame below and I want to remove the strings from Start
and Finish
columns using dateutil
package without specifying the column names.
df=[["1/5/2020 Yes", "5/9/2020 String",2,6],["1/8/2020 No","5/8/2020 sponge",8,9],["8/9/2020 Spine","5/8/2020 spike",8,9]]
df=pd.DataFrame(df)
df.columns=["Start","Finish","x1","x2"]
Here is my trial, but it is not working and throwing KeyError Traceback (most recent call last)
df[0] = df[0].apply(dparser.parse,fuzzy=True)
df[1] = df[1].apply(dparser.parse,fuzzy=True)
Can anyone help me to solve this please?
df[0]
access the column named 0
, which is not in your dataframe. You want to give the correct name, i.e. df['Start']
or use iloc
: df.iloc[:,0]
.
Also, another way to extract the date is to use regex
pattern, for example:
for i in range(2):
df.iloc[:,i] = df.iloc[:,i].str.extract('^(\S+)')[0]
Output:
Start Finish x1 x2
0 1/5/2020 5/9/2020 2 6
1 1/8/2020 5/8/2020 8 9
2 8/9/2020 5/8/2020 8 9