Search code examples
pythonnumpynormalizationbinning

Python convert Continuous data into categorial


I have a continuous floating point data, ranging from -257.2 to 154.98, I have no idea how it is distributed. But I would want it to be in the bins - say -270 to -201, -200 to -141, -140 to -71, -70 to -1, 0 to 69, 70 to 139, 140 to 209

Is there a way to do this?, Specifically, I am looking out for:

data = np.random.rand(10)
data
array([ 0.58791019,  0.2385624 ,  0.70927668,  0.22916244,  0.87479326,
        0.49609703,  0.3758358 ,  0.35743165,  0.30816457,  0.2018548 ])
def GenRangedData(data, min, max, step):
    #some code
    no_of_bins = (max - min)/ step
    bins = []
    #some code
    return bins

rd = GenRangedData(data, 0, 1, 0.1)
# should generate: 
rd
[[], [0.2385624, 0.22916244, 0.2018548], [0.3758358, 0.35743165, 0.30816457], [0.49609703], [0.58791019], [], [0.70927668], [0.87479326]]

I can obviously do this by manually iterating over all the numbers, but I am looking to automate it, so that min max and step can be experimented a lot. Is there a way to do this efficiently?


Solution

  • This is what I could come up with, I do not know if this is the best way, If you think this can be done faster, pl update/edit

    def GenRangedData(data, min, max, step):
        cat_data = []
        bins = ((i_max - i_min) / step) + 2
        for x in range(0, len(data)):
            temp_data = []
            for y in range(0, len(data[x])):
                for n in range(0, int(bins)):
                    if data[x][y] < (i_min + (n*step)):
                        temp_data.append(n)
                        break
        cat_data.append(temp_data)