Search code examples
python-3.xpandasfor-loopbucket

Bin Random Values Using .cut() on and place bins in new Pandas dataframe Columns


Able to finish the function and for loop below? Could not figure out how to bin the following columns and then 1) place the binned values into new columns, and 2) .add_prefix() to each of those new 7 columns with the prefix 'bin_'? Couldn't figure out how to get the function and for loop working.

binner=list(range(0,6))
countofbins=len(binner)
df = pd.DataFrame(np.random.rand(20,7), columns=list('ABCDEFG'))
df['bin_A']=pd.cut(x=df['A'],bins=countofbins,labels=binner)

def calculate_bins(bincolumnname, oldcolumnname):
    for ind, column in enumerate(df.columns):
    df[bincolumnname] = pd.cut(x=df[oldcolumnname], bins=countofbins,labels=binner)
    return df[bincolumnname]

Solution

  • If I understood you correctly, you are trying to create new column for every column you have in your data frame that contains the bin in which each cell is in.

    binner=list(range(0,6))
    df = pd.DataFrame(np.random.randint(low=1, high=5,size=(20,7)), columns=list('ABCDEFG'))
    for idx, col in enumerate(df.columns):
        df['bin_{}'.format(col)]=pd.cut(x=df.loc[:, col],bins=binner)
    

    output:

        A  B  C  D  E  F  G   bin_A   bin_B   bin_C   bin_D   bin_E   bin_F   bin_G
    0   1  3  2  1  4  3  2  (0, 1]  (2, 3]  (1, 2]  (0, 1]  (3, 4]  (2, 3]  (1, 2]
    1   1  1  3  2  4  4  4  (0, 1]  (0, 1]  (2, 3]  (1, 2]  (3, 4]  (3, 4]  (3, 4]
    2   1  2  2  3  1  2  2  (0, 1]  (1, 2]  (1, 2]  (2, 3]  (0, 1]  (1, 2]  (1, 2]
    3   1  1  1  2  2  1  3  (0, 1]  (0, 1]  (0, 1]  (1, 2]  (1, 2]  (0, 1]  (2, 3]
    4   1  3  1  1  4  4  4  (0, 1]  (2, 3]  (0, 1]  (0, 1]  (3, 4]  (3, 4]  (3, 4]
    5   4  3  3  1  1  3  1  (3, 4]  (2, 3]  (2, 3]  (0, 1]  (0, 1]  (2, 3]  (0, 1]
    6   1  2  1  4  2  2  3  (0, 1]  (1, 2]  (0, 1]  (3, 4]  (1, 2]  (1, 2]  (2, 3]
    7   4  2  2  1  3  2  3  (3, 4]  (1, 2]  (1, 2]  (0, 1]  (2, 3]  (1, 2]  (2, 3]
    8   1  1  4  1  1  2  1  (0, 1]  (0, 1]  (3, 4]  (0, 1]  (0, 1]  (1, 2]  (0, 1]
    9   3  1  4  1  3  2  4  (2, 3]  (0, 1]  (3, 4]  (0, 1]  (2, 3]  (1, 2]  (3, 4]
    10  2  2  2  3  3  4  4  (1, 2]  (1, 2]  (1, 2]  (2, 3]  (2, 3]  (3, 4]  (3, 4]
    11  1  2  1  1  4  3  3  (0, 1]  (1, 2]  (0, 1]  (0, 1]  (3, 4]  (2, 3]  (2, 3]
    12  4  1  1  1  4  1  1  (3, 4]  (0, 1]  (0, 1]  (0, 1]  (3, 4]  (0, 1]  (0, 1]
    13  1  2  4  4  2  4  3  (0, 1]  (1, 2]  (3, 4]  (3, 4]  (1, 2]  (3, 4]  (2, 3]
    14  3  3  4  4  2  4  2  (2, 3]  (2, 3]  (3, 4]  (3, 4]  (1, 2]  (3, 4]  (1, 2]
    15  1  4  1  3  2  2  3  (0, 1]  (3, 4]  (0, 1]  (2, 3]  (1, 2]  (1, 2]  (2, 3]
    16  4  2  2  3  2  1  2  (3, 4]  (1, 2]  (1, 2]  (2, 3]  (1, 2]  (0, 1]  (1, 2]
    17  1  2  3  4  3  2  3  (0, 1]  (1, 2]  (2, 3]  (3, 4]  (2, 3]  (1, 2]  (2, 3]
    18  2  3  3  2  3  3  3  (1, 2]  (2, 3]  (2, 3]  (1, 2]  (2, 3]  (2, 3]  (2, 3]
    19  2  4  1  1  3  2  4  (1, 2]  (3, 4]  (0, 1]  (0, 1]  (2, 3]  (1, 2]  (3, 4]
    

    You can change the labels of the bins using the labels argument in pandas.cut()