Search code examples
pythonperformancenumpyvectorizationunique

Concatenate unique numpy arrays with counters


Is there a simple yet efficient way to concatenate two unique numpy arrays which have counters?

Example:

values1 = np.array(['host1', 'host2', 'host3', 'host6'])
counts1 = np.array([2,5,2,4])

values2 = np.array(['host3', 'host1', 'host4'])
counts2 = np.array([5,7,1])

I'd like to have a result like:

values_res = np.array(['host1', 'host2', 'host3', 'host6', 'host4'])
counts_res = np.array([9,5,7,4,1])

They do not need to be ordered, but values_res does need to be unique.

I could iterate over the elements in the array, but that would not be efficient. I'd like to use vectorization somehow.


Solution

  • This is probably faster (specially for larger arrays) and is ordered:

    values_res, idx = np.unique(np.hstack((values1, values2)), return_inverse=True)
    counts_res = np.bincount(idx, np.hstack((counts1, counts2)))
    

    output:

    ['host1' 'host2' 'host3' 'host4' 'host6']
    [9. 5. 7. 1. 4.]
    

    Comparison using benchit:

    #@Ehsan's solution
    def m1(values1, values2, counts1, counts2):
      values_res, idx = np.unique(np.hstack((values1, values2)), return_inverse=True)
      counts_res = np.bincount(idx, np.hstack((counts1, counts2)))
      return values_res, counts_res
    
    #@Chris's solution
    def m2(values1, values2, counts1, counts2):
    
      values_res, counts_res = zip(*dict(Counter(dict(zip(values1,counts1))) + Counter(dict(zip(values2,counts2)))).items())
      return values_res, counts_res
    
    
    in_ = {n:[np.random.choice(values_res, n), np.random.choice(values_res, n), np.random.randint(1,100,n), np.random.randint(1,100,n)] for n in [10,100,1000,10000]}
    

    output:

    m1 is faster in this setting

    enter image description here