Search code examples
pythonpandasdataframeapplypython-dateutil

How to use `apply` in python without specifying the column name of the data frame?


I am trying to use apply function on a data frame to remove strings from the date columns. For instance, I have a data frame below and I want to remove the strings from Start and Finish columns using dateutil package without specifying the column names.

df=[["1/5/2020 Yes", "5/9/2020 String",2,6],["1/8/2020 No","5/8/2020 sponge",8,9],["8/9/2020 Spine","5/8/2020 spike",8,9]]
df=pd.DataFrame(df)
df.columns=["Start","Finish","x1","x2"]

Here is my trial, but it is not working and throwing KeyError Traceback (most recent call last)

df[0] = df[0].apply(dparser.parse,fuzzy=True)
df[1] = df[1].apply(dparser.parse,fuzzy=True)

Can anyone help me to solve this please?


Solution

  • df[0] access the column named 0, which is not in your dataframe. You want to give the correct name, i.e. df['Start'] or use iloc: df.iloc[:,0].

    Also, another way to extract the date is to use regex pattern, for example:

    for i in range(2):
        df.iloc[:,i] = df.iloc[:,i].str.extract('^(\S+)')[0]
    

    Output:

          Start    Finish  x1  x2
    0  1/5/2020  5/9/2020   2   6
    1  1/8/2020  5/8/2020   8   9
    2  8/9/2020  5/8/2020   8   9