I have 2 large arrays (500,1,23000) and another one (700,1,25000) I need to merge them. They are different
Easy example would be this:
a = np.array([['a', 3, 5, 6, 9], ['b', 14, 15, 56]])
b = np.array([['b', 4, 76, 44, 91, 100], ['c', 14, 15],['d',2,6,7])
Desired result:
c = [['a', 3, 5, 6, 9], ['b', 4, 76, 44, 91, 100],['c', 14, 15],['d', 2, 6, 7]]
This is part of data-preprocessing for machine learning.
This could probably be made faster (it iterates over both lists twice), but should give you what you want.
import numpy as np
from collections import defaultdict
a = np.array([['a', 3, 5, 6, 9], ['b', 14, 15, 56]])
b = np.array([['b', 4, 76, 44, 91, 100], ['c', 14, 15],['d',2,6,7]])
def dictify(arr):
return defaultdict(lambda : [], {x[0]: x[1:] for x in arr})
d1 = dictify(a)
d2 = dictify(b)
new_keys = set.union(set(d1.keys()), set(d2.keys()))
ans = [[k] + d1[k] + d2[k] for k in new_keys]
The value of ans
is:
[['d', 2, 6, 7], ['c', 14, 15], ['a', 3, 5, 6, 9], ['b', 14, 15, 56, 4, 76, 44, 91, 100]]