I'm trying to create a reusable def function that converts a Julian date in a pandas dataframe column into a Gregorian style date. When using the function, I get a TypeError: strptime() argument 1 must be str, not Series.
import pandas as pd
import datetime
df.head()
SDKCOO SDDOCO DATE_GL
0 00308 6118002.0 118337.0
1 00308 6118002.0 118337.0
2 00308 6118002.0 118337.0
in: df['DATE_GL'].dtype
out: dtype('float64')
def my_func(x):
x = x.astype(str)
year = x.str[1:3]
jday = x.str[3:6]
x = year + jday
x = x.astype(str)
x = datetime.datetime.strptime(x,'%y%j') #this line gives me the issue
return x
df['DATE_GL'] = my_func(df['DATE_GL'])
Then I get this TypeError:
TypeError Traceback (most recent call last)
<ipython-input-4-bc5147e6c807> in <module>
----> 1 df['DATE_GL'] = my_func(df['DATE_GL'])
<ipython-input-3-c25482ba9377> in my_func(x)
5 x = year + jday
6 x = x.astype(str)
----> 7 x = datetime.datetime.strptime(x,'%y%j')
8 return x
TypeError: strptime() argument 1 must be str, not Series
I can achieve my desired output as follows, but I have to modify the above function and also use an apply method with a lambda function to achieve it, which is what I don't want. I want everything to flow through the function so that I can easily call it and apply it to other dataframes that have the same date formatting issue.
Desired output:
SDKCOO SDDOCO DATE_GL
0 00308 6118002.0 2018-12-03
1 00308 6118002.0 2018-12-03
2 00308 6118002.0 2018-12-03
Here is the modified function and additional apply code line that helps me achieve the results I want above.
def my_func(x):
x = x.astype(str)
year = x.str[1:3]
jday = x.str[3:6]
x = year + jday
x = x.astype(str)
return x
df['DATE_GL'] = df['DATE_GL'].apply(lambda x: datetime.datetime.strptime(x,'%y%j'))
Why can't I get the desired result by having everything flow through my def function? What is causing the TypeError issue? I converted "x" to a string.
datetime just works with individual strings, not with series as you have. When you send df['DATE_GL'] to your function, you are sending the three values of the column.
To work with a complete column of a dataframe you should change:
datetime.datetime.strptime(x,'%y%j')
to
x = pd.to_datetime(x, format = '%y%j')
Your code should be like this:
def my_func(x):
x = x.astype(str)
year = x.str[1:3]
jday = x.str[3:6]
x = year + jday
x = x.astype(str)
x = pd.to_datetime(x, format = '%y%j')
return x
df['DATE_GL'] = my_func(df['DATE_GL'])