Search code examples
pythonpython-3.xinner-join

JOINning two List-of-Lists to one


The issue before me is to JOIN, SQL-Like, two arrays, with a "key" formed from two columns, YEAR and MONTH. The two arrays represent incomes (for each Year and month) and likewise expenses. I want to JOIN them, using the key, producing another array with four columns: YEAR, MONTH, INCOME, EXPENSE.

The two arrays I have are:

income = [["2019","Jan.", 2000],
          ["2019","Feb.", 1500],
          [ ---- , ---  , --- ],
          ["2019","Dec.", 1200],
          ["2020","Jan.", 1400],
          [ ---- , ---  , --- ],
          ["2020","Dec.", 1300]]

Expenses = [["2019","Jan.", 1800],
            ["2019","Feb.", 1400],
            [ ---- , ---  , --- ],
            ["2019","Dec.", 1100],
            ["2020","Jan.", 1300],
            [ ---- , ---  , --- ],
            ["2020","Dec.", 1200]]

And the desired result is:

Joined =   [["2019","Jan.", 2000, 1800],
            ["2019","Feb.", 1500, 1400],
            [ ---- , ---  , ---   ----],
            ["2019","Dec.", 1200, 1100],
            ["2020","Jan.", 1400, 1300],
            [ ---- , ---  , ---   ----],
            ["2020","Dec.", 1300, 1200]]

What do I do? List comprehension? for loop? What would be the pythonic way?


Solution

  • Just use Pandas to convert your lists (income and Expenses) into Dataframes, merge them (in this case it's basically an inner join on Year and Month) and then convert the Dataframe you get into a list of lists.

    df1 = pd.DataFrame(income, columns=["Year", "Month", "X"])
    df2 = pd.DataFrame(Expenses, columns=["Year", "Month", "Y"])
    joined = df1.merge(df2, on=["Year", "Month"]).values.tolist()
    

    Output:

    [['2019', 'Jan.', 2000, 1800], ['2019', 'Feb.', 1500, 1400], ['2019', 'Dec.', 1200, 1100], ['2020', 'Jan.', 1400, 1300], ['2020', 'Dec.', 1300, 1200]]
    

    PS: I removed all the [ ---- , --- , --- ] from the two lists if you're wondering why they are not in the output.