python numpy machine-learning keras autoencoder

How to merge 2 arrays python (similar to SQL Join)

I have 2 large arrays (500,1,23000) and another one (700,1,25000) I need to merge them. They are different

Easy example would be this:

a = np.array([['a', 3, 5, 6, 9], ['b', 14, 15, 56]])
b = np.array([['b', 4, 76, 44, 91, 100], ['c', 14, 15],['d',2,6,7])

Desired result:

c = [['a', 3, 5, 6, 9], ['b', 4, 76, 44, 91, 100],['c', 14, 15],['d', 2, 6, 7]]

This is part of data-preprocessing for machine learning.

Solution

This could probably be made faster (it iterates over both lists twice), but should give you what you want.

import numpy as np
from collections import defaultdict

a = np.array([['a', 3, 5, 6, 9], ['b', 14, 15, 56]])
b = np.array([['b', 4, 76, 44, 91, 100], ['c', 14, 15],['d',2,6,7]])

def dictify(arr):
    return defaultdict(lambda : [], {x[0]: x[1:] for x in arr})

d1 = dictify(a)
d2 = dictify(b)

new_keys = set.union(set(d1.keys()), set(d2.keys()))

ans = [[k] + d1[k] + d2[k] for k in new_keys]

The value of ans is:

[['d', 2, 6, 7], ['c', 14, 15], ['a', 3, 5, 6, 9], ['b', 14, 15, 56, 4, 76, 44, 91, 100]]