Search code examples
pythonpandasjoingroup-bycategorical-data

Joining Three Data Frames for all Iterations of Car Color, Pets, and Items Purchased in Pandas- groupby?


I currently have three data frames: purchases, car colors, and pets. An example of that data can be formed with the following code:

import pandas as pd

cols = ['MEMBER', 'CAR_COLOR', 'PET', 'PURCHASE_TYPE', 'ITEM_1', 'ITEM_2', 'ITEM_3', 'ITEM_4']

data1 = [
     [1, "", "", "CC", "BALL", "SHIRT", "VIOLIN", "SWEATER"],
     [1, "", "", "CC", "CANVAS", "COFFEE", "", ""],
     [1, "", "", "CSH", "TOY", "VIDEO_GAME", "GUITAR", ""],
     [2, "", "", "CC", "VEST", "BOOK", "EGGS", "BREAD"],
     [2, "", "", "CHK", "APPLES", "TOOLS", "", ""]
]

colors = [
    [1,"BLUE"],
    [1, "RED"],
    [2, "BLUE"],
    [2, "GREEN"],
    [2, "WHITE"]
]

pets = [
    [1, 'FISH'],
    [2, 'DOG'],
    [2, 'CAT'],
    [2, 'FISH']
]

df_data = pd.DataFrame(data1, columns=cols)
df_colors = pd.DataFrame(colors, columns=['MEMBER', 'CAR_COLOR'])
df_pets = pd.DataFrame(pets, columns = ['MEMBER', 'PET'])

Essentially, I want to join these dataframes such that each unique iteration of
(MEMBER, PURCHASE_TYPE, ITEMS1-4)/CAR_COLOR/PET is listed as it's own row. Initially, the purchase data frame (data1) had unique pairwise values of (MEMBER, PURCHASE_TYPE) but these were trimmed down such that if a pair had more than four items, the items past item 4 become their own new row, as seen with rows 1 and 2 of data1. The final dataframe should look as it does in this image.

In my actual situation, each of the three initial Dataframes has thousands of rows, so ideally a solution would be robust / easy to streamline into larger sets of data. Please let me know if this isn't enough information or if you have any pertinent questions. I imagine the solution involves some sort of grouping and merging but for some reason my brain has hit a wall with this problem. Any help is beyond appreciated!


Solution

  • You can do it like this:

    cols = ['MEMBER', 'PURCHASE_TYPE', 'ITEM_1', 'ITEM_2', 'ITEM_3', 'ITEM_4']
    df_out  = (df_data[cols].set_index('MEMBER').join([di.set_index('MEMBER') for di in [df_colors, df_pets]])
                 .sort_values(['MEMBER', 'PET']).reset_index())
    

    Output:

        MEMBER PURCHASE_TYPE  ITEM_1      ITEM_2  ITEM_3   ITEM_4 CAR_COLOR   PET
    0        1            CC    BALL       SHIRT  VIOLIN  SWEATER      BLUE  FISH
    1        1            CC    BALL       SHIRT  VIOLIN  SWEATER       RED  FISH
    2        1            CC  CANVAS      COFFEE                       BLUE  FISH
    3        1            CC  CANVAS      COFFEE                        RED  FISH
    4        1           CSH     TOY  VIDEO_GAME  GUITAR               BLUE  FISH
    5        1           CSH     TOY  VIDEO_GAME  GUITAR                RED  FISH
    6        2            CC    VEST        BOOK    EGGS    BREAD      BLUE   CAT
    7        2            CC    VEST        BOOK    EGGS    BREAD     GREEN   CAT
    8        2            CC    VEST        BOOK    EGGS    BREAD     WHITE   CAT
    9        2           CHK  APPLES       TOOLS                       BLUE   CAT
    10       2           CHK  APPLES       TOOLS                      GREEN   CAT
    11       2           CHK  APPLES       TOOLS                      WHITE   CAT
    12       2            CC    VEST        BOOK    EGGS    BREAD      BLUE   DOG
    13       2            CC    VEST        BOOK    EGGS    BREAD     GREEN   DOG
    14       2            CC    VEST        BOOK    EGGS    BREAD     WHITE   DOG
    15       2           CHK  APPLES       TOOLS                       BLUE   DOG
    16       2           CHK  APPLES       TOOLS                      GREEN   DOG
    17       2           CHK  APPLES       TOOLS                      WHITE   DOG
    18       2            CC    VEST        BOOK    EGGS    BREAD      BLUE  FISH
    19       2            CC    VEST        BOOK    EGGS    BREAD     GREEN  FISH
    20       2            CC    VEST        BOOK    EGGS    BREAD     WHITE  FISH
    21       2           CHK  APPLES       TOOLS                       BLUE  FISH
    22       2           CHK  APPLES       TOOLS                      GREEN  FISH
    23       2           CHK  APPLES       TOOLS                      WHITE  FISH