Search code examples
pythonarraysnumpyindexingfrequency

Numpy: frequency array to distribution


In python with numpy what is the fastest way to turn an array like

array([0,2,3,1,0,0,1])

into another array

array([1,1,2,2,2,3,6])

where the first array gives the frequency of each index (i.e. index 0 has a frequency of 0, index 1 has a frequency of 2, index 2 has a frequency of 3, and so on) and the second repeats each index as many times as specified in the first array.

Background: I use this to 'enflate' (I can't find any better word for it) a k by k Matrix M (sparse or not) given a length k frequency vector f:

f  = np.array([0,2,3,1,0,0,1])
f_ = np.array([1,1,2,2,2,3,6])
M_ = M[f_[:,None],f_]

Solution

  • Use np.repeat on the range array covering the length of the input array with the array itself for the count of repetitions -

    np.repeat(np.arange(len(a)), a)
    

    Sample run -

    In [109]: a
    Out[109]: array([0, 2, 3, 1, 0, 0, 1])
    
    In [110]: np.repeat(np.arange(len(a)), a)
    Out[110]: array([1, 1, 2, 2, 2, 3, 6])