Search code examples
pythonpandasdataframemappingpandas-merge

Mapping positions to names based on date in a pandas DataFrame


I have a DataFrame of "positions" and the "date they came to office":

position  cameToOfficeDate
CEO       2020-06-01
CEO       2021-01-01
CEO       2021-02-02
Slave     2020-01-01

I want to map the position to employee's names for different cameToOfficeDate but based on the commencementDate of the positions too. For example, the CEO was Bill before 2021 and was Mark after 2021.

position  commenceDate  Name
CEO       2020-01-01    Bill
CEO       2021-01-01    Mark
Slave     2020-01-01    Elon

So the output should be:

position  cameToOfficeDate  Name
CEO       2020-06-01        Bill
CEO       2021-01-01        Mark
CEO       2021-02-02        Mark
Slave     2020-01-01        Elon

I am wondering how to create the mapping table structure. I tried using a dictionary but it doesn't allow comparing whether the cameToOfficeDate date is smaller than or bigger than the commencementDate of the position. Tried joining the table together on position too and it doesn't work.


Solution

  • You could use merge_asof. Since merge_asof expects the keys to be sorted, we convert the dates to datetime objects and sort; then merge.

    df1['cameToOfficeDate'] = pd.to_datetime(df1['cameToOfficeDate'])
    df2['commenceDate'] = pd.to_datetime(df2['commenceDate'])
    out = (pd.merge_asof(df1.sort_values(by='cameToOfficeDate'), 
                         df2.sort_values(by='commenceDate'), 
                  left_on='cameToOfficeDate', right_on='commenceDate', 
                         by='position')
           .drop(columns='commenceDate')
           .sort_values(by=['position','cameToOfficeDate']))
    

    Output:

      position cameToOfficeDate  Name
    1      CEO       2020-06-01  Bill
    2      CEO       2021-01-01  Mark
    3      CEO       2021-02-02  Mark
    0    Slave       2020-01-01  Elon