Search code examples
pythonnumpymachine-learningkerasautoencoder

How to merge 2 arrays python (similar to SQL Join)


I have 2 large arrays (500,1,23000) and another one (700,1,25000) I need to merge them. They are different

Easy example would be this:

a = np.array([['a', 3, 5, 6, 9], ['b', 14, 15, 56]])
b = np.array([['b', 4, 76, 44, 91, 100], ['c', 14, 15],['d',2,6,7])

Desired result:

c = [['a', 3, 5, 6, 9], ['b', 4, 76, 44, 91, 100],['c', 14, 15],['d', 2, 6, 7]]

This is part of data-preprocessing for machine learning.


Solution

  • This could probably be made faster (it iterates over both lists twice), but should give you what you want.

    import numpy as np
    from collections import defaultdict
    
    a = np.array([['a', 3, 5, 6, 9], ['b', 14, 15, 56]])
    b = np.array([['b', 4, 76, 44, 91, 100], ['c', 14, 15],['d',2,6,7]])
    
    def dictify(arr):
        return defaultdict(lambda : [], {x[0]: x[1:] for x in arr})
    
    d1 = dictify(a)
    d2 = dictify(b)
    
    new_keys = set.union(set(d1.keys()), set(d2.keys()))
    
    ans = [[k] + d1[k] + d2[k] for k in new_keys]
    

    The value of ans is:

    [['d', 2, 6, 7], ['c', 14, 15], ['a', 3, 5, 6, 9], ['b', 14, 15, 56, 4, 76, 44, 91, 100]]