Search code examples
pythonpandasscalerescale

Python: Remap and reduce the range of numbers


I have some large unique numbers that are some sort of identity of devices

clusteringOutput[:,1]
Out[140]: 
array([1.54744609e+12, 1.54744946e+12, 1.54744133e+12, ...,
       1.54744569e+12, 1.54744570e+12, 1.54744571e+12])

even though the numbers are large they are only a handful of those that just repeat over the entries.

I would like to remap those into smaller ranges of integers. So if these numbers are only different 100 values I would like then to map them in the scale from 1-100 with a mapping table that allows me to find and see those mappings.

In the internet the remapping functions, typically will rescale and I do not want to rescale. I want to have concrete integer numbers that map the longer ids I have to simpler to the eyes numbers.

Any ideas on how I can implement that? I can use pandas data frames if it helps.

Thanks a lot Alex


Solution

  • Use numpy.unique with return_inverse=True:

    import numpy as np
    
    arr = np.array([1.54744609e+12,
                    1.54744946e+12,
                    1.54744133e+12,
                    1.54744133e+12,
                    1.54744569e+12, 
                    1.54744570e+12, 
                    1.54744571e+12])
    
    mapper, ind = np.unique(arr, return_inverse=True)
    

    Output of ind:

    array([4, 5, 0, 0, 1, 2, 3])
    

    Remapping using mapper:

    mapper[ind]
    
    # array([1.54744609e+12, 1.54744946e+12, 1.54744133e+12, 1.54744133e+12,
    #       1.54744569e+12, 1.54744570e+12, 1.54744571e+12])
    

    Validation:

    all(arr == mapper[ind])
    # True