Search code examples
pythonpandasjoinmergeconcatenation

Pandas/Python - merging 2 dataframes, using columns as data and rows as extra rows in original dataframes


I have 2 dataframes, as follows

df (contains the index, the title of the bank, the last changed date of the bank savings APR and the uid, which is the same as the index, I was experimenting with matching on that column)

index title last_changed_friendly uid
55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e Republic Bank Digital Money Market 2023-02-28 22:59:49 55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e
af1d2fcd-5f09-4611-80c9-564fb346c875 Bank Purely Money Market 1970-01-01 00:00:00 af1d2fcd-5f09-4611-80c9-564fb346c875

tempdf (contains the index, all records in tempdf match the uid/index of the uid/index in df, the date which the APR was retrieved and finally a reference to the text file that contains the APR itself)

uid time_date_friendly Reference
55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e 2023-02-18 19:00:01 b866e03c-928b-4ea0-a293-5ae69fe88cfc.txt
55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e 2023-02-18 19:01:29 86c5ef05-fac0-4177-a347-8585299ecb88.txt
55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e 2023-02-28 22:59:49 99c911bc-86be-4bd4-afdd-89acd279aaa1.txt

I am trying to create the following dataframe from the above

index title last_changed_friendly uid time_date_friendly Reference
55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e Republic Bank Digital Money Market 2023-02-28 22:59:49 55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e 2023-02-18 19:00:01 b866e03c-928b-4ea0-a293-5ae69fe88cfc.txt
55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e Republic Bank Digital Money Market 2023-02-28 22:59:49 55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e 2023-02-18 19:01:29 86c5ef05-fac0-4177-a347-8585299ecb88.txt
55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e Republic Bank Digital Money Market 2023-02-28 22:59:49 55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e 2023-02-28 22:59:49 99c911bc-86be-4bd4-afdd-89acd279aaa1.txt

I have tried various iterations of join, concat and merge and I get the closest with the following:

df = tempdf.merge(df,how='left', left_on='uid', right_on='uid')

But that creates 9 rows, as opposed to the 3 that I am trying to get to, as shown above.


Solution

  • df = pd.DataFrame([
        ["55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e", "Republic Bank Digital Money Market", "2023-02-28 22:59:49", "55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e"],
        ["af1d2fcd-5f09-4611-80c9-564fb346c875", "Bank Purely Money Market", "1970-01-01 00:00:00", "af1d2fcd-5f09-4611-80c9-564fb346c875"],
    ], columns = ["index", "title", "last_changed_friendly", "uid"])
    
    index title last_changed_friendly uid
    0 55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e Republic Bank Digital Money Market 2023-02-28 22:59:49 55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e
    1 af1d2fcd-5f09-4611-80c9-564fb346c875 Bank Purely Money Market 1970-01-01 00:00:00 af1d2fcd-5f09-4611-80c9-564fb346c875
    tempdf = pd.DataFrame([
        ["55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e", "2023-02-18 19:00:01", "b866e03c-928b-4ea0-a293-5ae69fe88cfc.txt"],
        ["55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e", "2023-02-18 19:01:29", "86c5ef05-fac0-4177-a347-8585299ecb88.txt"],
        ["55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e", "2023-02-28 22:59:49", "99c911bc-86be-4bd4-afdd-89acd279aaa1.txt"],
    ], columns=["uid", "time_date_friendly", "Reference"])
    
    uid time_date_friendly Reference
    0 55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e 2023-02-18 19:00:01 b866e03c-928b-4ea0-a293-5ae69fe88cfc.txt
    1 55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e 2023-02-18 19:01:29 86c5ef05-fac0-4177-a347-8585299ecb88.txt
    2 55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e 2023-02-28 22:59:49 99c911bc-86be-4bd4-afdd-89acd279aaa1.txt
    df.merge(tempdf, on='uid')
    
    index title last_changed_friendly uid time_date_friendly Reference
    0 55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e Republic Bank Digital Money Market 2023-02-28 22:59:49 55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e 2023-02-18 19:00:01 b866e03c-928b-4ea0-a293-5ae69fe88cfc.txt
    1 55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e Republic Bank Digital Money Market 2023-02-28 22:59:49 55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e 2023-02-18 19:01:29 86c5ef05-fac0-4177-a347-8585299ecb88.txt
    2 55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e Republic Bank Digital Money Market 2023-02-28 22:59:49 55e0f3b0-aacb-48a6-b71c-2fe282eb6e2e 2023-02-28 22:59:49 99c911bc-86be-4bd4-afdd-89acd279aaa1.txt