Is there a simple yet efficient way to concatenate two unique numpy arrays which have counters?
values1 = np.array(['host1', 'host2', 'host3', 'host6'])
counts1 = np.array([2,5,2,4])
values2 = np.array(['host3', 'host1', 'host4'])
counts2 = np.array([5,7,1])
I'd like to have a result like:
values_res = np.array(['host1', 'host2', 'host3', 'host6', 'host4'])
counts_res = np.array([9,5,7,4,1])
They do not need to be ordered, but values_res
does need to be unique.
I could iterate over the elements in the array, but that would not be efficient. I'd like to use vectorization somehow.
This is probably faster (specially for larger arrays) and is ordered:
values_res, idx = np.unique(np.hstack((values1, values2)), return_inverse=True)
counts_res = np.bincount(idx, np.hstack((counts1, counts2)))
output:
['host1' 'host2' 'host3' 'host4' 'host6']
[9. 5. 7. 1. 4.]
Comparison using benchit:
#@Ehsan's solution
def m1(values1, values2, counts1, counts2):
values_res, idx = np.unique(np.hstack((values1, values2)), return_inverse=True)
counts_res = np.bincount(idx, np.hstack((counts1, counts2)))
return values_res, counts_res
#@Chris's solution
def m2(values1, values2, counts1, counts2):
values_res, counts_res = zip(*dict(Counter(dict(zip(values1,counts1))) + Counter(dict(zip(values2,counts2)))).items())
return values_res, counts_res
in_ = {n:[np.random.choice(values_res, n), np.random.choice(values_res, n), np.random.randint(1,100,n), np.random.randint(1,100,n)] for n in [10,100,1000,10000]}
output:
m1 is faster in this setting