Search code examples
pythonpandasdataframenumpyjupyter-notebook

Joining 2 separate excel based 1 condition with python


I have these two frames

df1

birthdate ceremony_number
9/30/1895 1st
7/23/1884 1st
3/29/1889 2nd
4/10/1868 3rd
4/8/1892 2nd

df2

index dates
1 1929-05-16
2 1930-04-03
3 1930-11-05

I thing is combine both based on the ceremony_number column that its at df1. I mean, if df1['ceremony_number'] matchs with df2['index'] then grab df2['dates'] and add it to new column df1['date_oscar']. The new column should look like this

df1

birthdate date_oscar
1895-09-30 1929-05-16
1884-07-23 1929-05-16
1889-03-29 1930-04-03
1868-04-10 1930-11-05
1892-8-4 1930-04-03

I've been trying to do this but it is not working

award_year = []
for index, row in df.iterrows():
    award_year.append(df1[(row['ceremony_number'] == df2['index'])])
df1['date_oscar'] = award_year

And this is the error:

Empty DataFrame Columns: [index, fechas] Index...

Any suggestion? Thanks in advance!


Solution

  • You can map ceremony_number after extracting digits to df2 index:

    df1['birthdate'] = pd.to_datetime(df1['birthdate'], format='%m/%d/%Y')
    df2['dates'] = pd.to_datetime(df2['dates'], format='%Y-%m-%d')
    
    num = df1['ceremony_number'].str.extract('^(\d+)', expand=False).astype(int)
    df1['date_oscar'] = num.map(df2['dates'])
    

    Output:

    >>> df1
       birthdate ceremony_number date_oscar
    0 1895-09-30             1st 1929-05-16
    1 1884-07-23             1st 1929-05-16
    2 1889-03-29             2nd 1930-04-03
    3 1868-04-10             3rd 1930-11-05
    4 1892-04-08             2nd 1930-04-03
    

    Minimal Working Example

    data1 = {'birthdate': {0: '9/30/1895', 1: '7/23/1884', 2: '3/29/1889',
                           3: '4/10/1868', 4: '4/8/1892'},
             'ceremony_number': {0: '1st', 1: '1st', 2: '2nd', 3: '3rd', 4: '2nd'}}
    df1 = pd.DataFrame(data1)
    
    data2 = {'dates': {1: '1929-05-16', 2: '1930-04-03', 3: '1930-11-05'}}
    df2 = pd.DataFrame(data2)
    
    # df1
       birthdate ceremony_number
    0  9/30/1895             1st
    1  7/23/1884             1st
    2  3/29/1889             2nd
    3  4/10/1868             3rd
    4   4/8/1892             2nd
    
    # df2
            dates
    1  1929-05-16
    2  1930-04-03
    3  1930-11-05