Search code examples
pythonpandasdatetimesasapply

Joining Function outputs with existing columns as new variable


I'm looking to do a conversion from datetime/pandas date's storage system to the one SAS uses for the purpose of joining with a variable.

Basically would like to go from this:

ID  date
101 20-01-01
102 21-01-01
103 22-01-01

To this:

ID DateID  date
101 10114995 20-01-01
102 10214996 21-01-01
103 10314997 22-01-01

I've written a function to pass to each row:

def SASdate(datein):
        return str(math.trunc(pd.to_timedelta(pd.to_datetime(datein) -  pd.to_datetime('1960-1-1'), unit='D') / pd.Timedelta(days=1)))

But I believe that it is FAR from optimized as:

  1. It takes a long time to run on my actual dataset.
  2. The output doesn't seem to concatenate as desired.

e.g.:

str(df['ID']) + df['date'].apply([SASdate])
Out: 
                                             SASdate
0  0    101\n1    102\n2    103\nName: ID, dtype:...
1  0    101\n1    102\n2    103\nName: ID, dtype:...
2  0    101\n1    102\n2    103\nName: ID, dtype:...

I'm sure there's a lambda method or better way to concatenate these that I'm missing, but it's not readily apparent to me.

Any suggestions on how to accomplish this would be appreciated.

Some code below to generate a sample dataframe:

df = pd.DataFrame({'ID':  [101,102,103],
        'date': ['20-01-01', '21-01-01', '22-01-01']
        })

Solution

  • Just use your function outside apply in a vectorized way:

    def SASdate(datein):
        origin = pd.to_datetime('1960-1-1')
        return pd.to_datetime(datein, format='%d-%m-%y').sub(origin).dt.days
    
    df['DateID'] = df['ID'].astype(str) + SASdate(df['date']).astype(str)
    

    Output:

    >>> df
        ID      date    DateID
    0  101  20-01-01  10114995
    1  102  21-01-01  10214996
    2  103  22-01-01  10314997