python-3.x sorting numpy indexing frequency

How can I use a dictionary to map array indices to the corresponding argsorted indices if all indices are in sub-arrays?

I have multiple arrays that correspond to data parameters of a time-series. The data parameters include things like speed, hour of occurrence, day of occurrence, month of occurrence, elapsed hour of occurrence, etc.

I am trying to find the indices that correspond to a grouping of a specified data parameter from highest to lowest frequency of occurrence.

As a simple example, consider the following:

import numpy as np

speed = np.array([4, 6, 8, 3, 6, 9, 7, 6, 4, 3])*100
elap_hr = sorted(np.random.randint(low=1, high=40, size=10))
## ... other time parameter arrays

print(speed)
# [400 600 800 300 600 900 700 600 400 300]

print(elap_hr)
# [ 1  2  6  7 13 19 21 28 33 38]

So observed speed = 400 (2 occurrences) corresponds to the elapsed hours = 1, 33; speed = 600 (3 occurrences) corresponds to elapsed hours = 2, 13, 28.

For this example, say I am interested in grouping speed by frequency of occurrence. Once I have the indices that group speed from highest to lowest frequency, I can apply the same indices on the other data parameter arrays (like elap_hr).

I first sort and argsort speed; then I find the unique elements of sorted speed. I combine these to find the indices of sorted speed that correspond to the sorted unique speed, which are grouped as sub-arrays per value in the sorted unique speed.

def get_sorted_data(data, sort_type='default'):
    if sort_type == 'default':
        res = sorted(data)
    elif sort_type == 'argsort':
        res = np.argsort(data)
    elif sort_type == 'by size':
        res = sorted(data, key=len)
    return res

def sort_data_by_frequency(data):
    uniq_data = np.unique(data)
    sorted_data = get_sorted_data(data)
    res = [np.where(sorted_data == uniq_data[i])[0] for i in range(len(uniq_data))]
    res = get_sorted_data(res, 'by size')[::-1]
    return res 

sorted_speed = get_sorted_data(speed)
argsorted_speed = get_sorted_data(speed, 'argsort')
freqsorted_speed = sort_data_by_frequency(speed)

print(sorted_speed)
# [300, 300, 400, 400, 600, 600, 600, 700, 800, 900]
print(argsorted_speed)
# [3 9 0 8 1 4 7 6 2 5]
print(freqsorted_speed)
# [array([4, 5, 6]), array([2, 3]), array([0, 1]), array([9]), array([8]), array([7])]

In freqsorted_speed, the first sub-array [4, 5, 6] corresponds to the indices of elements [600, 600, 600] in the sorted array.

This is ok up to this point. But, I want the indices to apply to all data parameter arrays. So, I need to map the argsorted indices to the original array indices.

def get_dictionary_mapping(keys, values):
    ## since all indices are unique, there is no worry about identical keys
    return dict(zip(keys, values))

idx_orig = np.array([i for i in range(len(argsorted_speed))], dtype=int)
index_to_index_map = get_dictionary_mapping(idx_orig, argsorted_speed)

print(index_to_index_map)
# {0: 3, 1: 9, 2: 0, 3: 8, 4: 1, 5: 4, 6: 7, 7: 6, 8: 2, 9: 5}

print(speed[idx_orig])
# [400 600 800 300 600 900 700 600 400 300]

print(speed[argsorted_speed])
# [300 300 400 400 600 600 600 700 800 900]

print([index_to_index_map[idx_orig[i]] for i in range(len(idx_orig))])
# [3, 9, 0, 8, 1, 4, 7, 6, 2, 5]

I have all the necessary pieces to accomplish what I want. But I'm not quite sure how to put this altogether. Any advice would be appreciated.

EDIT:

As an end result, I would like to have the original indices of speed grouped by frequency like so:

res = [[1, 4, 7], [3, 9], [0, 8], ...]
## corresponds to 3 600's, 2 300's, 2 400's, etc.
## for values of equal frequency, the secondary grouping is from min-to-max

This way, I can choose the values by the nth most frequent value or by the frequency itself.

Solution

Your desired result can be obtained as follows:

>>> idx = np.argsort(speed)
>>> res = sorted(np.split(idx, np.flatnonzero(np.diff(speed[idx])) + 1), key=len, reverse=True)
>>> res
[array([1, 4, 7]), array([3, 9]), array([0, 8]), array([6]), array([2]), array([5])]