I'm looking to do a conversion from datetime
/pandas
date's storage system to the one SAS
uses for the purpose of joining with a variable.
Basically would like to go from this:
ID date
101 20-01-01
102 21-01-01
103 22-01-01
To this:
ID DateID date
101 10114995 20-01-01
102 10214996 21-01-01
103 10314997 22-01-01
I've written a function to pass to each row:
def SASdate(datein):
return str(math.trunc(pd.to_timedelta(pd.to_datetime(datein) - pd.to_datetime('1960-1-1'), unit='D') / pd.Timedelta(days=1)))
But I believe that it is FAR from optimized as:
e.g.:
str(df['ID']) + df['date'].apply([SASdate])
Out:
SASdate
0 0 101\n1 102\n2 103\nName: ID, dtype:...
1 0 101\n1 102\n2 103\nName: ID, dtype:...
2 0 101\n1 102\n2 103\nName: ID, dtype:...
I'm sure there's a lambda method or better way to concatenate these that I'm missing, but it's not readily apparent to me.
Any suggestions on how to accomplish this would be appreciated.
Some code below to generate a sample dataframe:
df = pd.DataFrame({'ID': [101,102,103],
'date': ['20-01-01', '21-01-01', '22-01-01']
})
Just use your function outside apply
in a vectorized way:
def SASdate(datein):
origin = pd.to_datetime('1960-1-1')
return pd.to_datetime(datein, format='%d-%m-%y').sub(origin).dt.days
df['DateID'] = df['ID'].astype(str) + SASdate(df['date']).astype(str)
Output:
>>> df
ID date DateID
0 101 20-01-01 10114995
1 102 21-01-01 10214996
2 103 22-01-01 10314997