Search code examples
pythonpandasdatetimestrptime

datetime.strptime not taking argument passed by custom function


I'm trying to create a reusable def function that converts a Julian date in a pandas dataframe column into a Gregorian style date. When using the function, I get a TypeError: strptime() argument 1 must be str, not Series.

import pandas as pd
import datetime

df.head()

    SDKCOO   SDDOCO       DATE_GL
0   00308   6118002.0   118337.0
1   00308   6118002.0   118337.0
2   00308   6118002.0   118337.0

in:  df['DATE_GL'].dtype
out: dtype('float64')

def my_func(x):
    x = x.astype(str)
    year = x.str[1:3]
    jday = x.str[3:6]
    x = year + jday
    x = x.astype(str)
    x = datetime.datetime.strptime(x,'%y%j') #this line gives me the issue
    return x

df['DATE_GL'] = my_func(df['DATE_GL'])

Then I get this TypeError:


TypeError                                 Traceback (most recent call last)
<ipython-input-4-bc5147e6c807> in <module>
----> 1 df['DATE_GL'] = my_func(df['DATE_GL'])

<ipython-input-3-c25482ba9377> in my_func(x)
      5     x = year + jday
      6     x = x.astype(str)
----> 7     x = datetime.datetime.strptime(x,'%y%j')
      8     return x

TypeError: strptime() argument 1 must be str, not Series

I can achieve my desired output as follows, but I have to modify the above function and also use an apply method with a lambda function to achieve it, which is what I don't want. I want everything to flow through the function so that I can easily call it and apply it to other dataframes that have the same date formatting issue.

Desired output:

    SDKCOO  SDDOCO      DATE_GL
0   00308   6118002.0   2018-12-03
1   00308   6118002.0   2018-12-03
2   00308   6118002.0   2018-12-03

Here is the modified function and additional apply code line that helps me achieve the results I want above.

def my_func(x):
    x = x.astype(str)
    year = x.str[1:3]
    jday = x.str[3:6]
    x = year + jday
    x = x.astype(str)
    return x

df['DATE_GL'] = df['DATE_GL'].apply(lambda x: datetime.datetime.strptime(x,'%y%j'))

Why can't I get the desired result by having everything flow through my def function? What is causing the TypeError issue? I converted "x" to a string.


Solution

  • datetime just works with individual strings, not with series as you have. When you send df['DATE_GL'] to your function, you are sending the three values of the column. To work with a complete column of a dataframe you should change:
    datetime.datetime.strptime(x,'%y%j') to x = pd.to_datetime(x, format = '%y%j')

    Your code should be like this:

    def my_func(x):
        x = x.astype(str)
        year = x.str[1:3]
        jday = x.str[3:6]
        x = year + jday
        x = x.astype(str)
        x = pd.to_datetime(x, format = '%y%j')
        return x
    
    df['DATE_GL'] = my_func(df['DATE_GL'])