i have a dataframe which contains 4 columns, for each column we have to do bucketing (distribute data in 8 buckets) in such a way that bucketing should be done for first column and second column and so on iteratively without specifying the column name manually
this is the code i am trying
for col in df3.columns[0:]:
cb1 = np.linspace(min(col), max(col), 11)
df3.insert(2 ,'buckets',pd.cut(col, cb1, labels=np.arange(1, 11, 1)))
print(df3[col])
here df3 is the sample dataset
apple orange banana
5 2 6
6 4 6
2 8 9
4 7 0
the expected output is
apple orange banana bucket_apple bucket_orange bucket_banana
5 2 6 1 3 2
6 4 6 1 1 4
2 8 9 2 1 8
4 7 0 5 4 1
here the bucket column is specifying the bucket number with respect to data
Since the output is totally random, there's no correlation between your data columns and the bucket nums, you should generate the buckets separately in that case.
for c in df.columns:
df['bucket_' + c] = np.random.randint(8, size=(len(df))) + 1
df # your random bucket df.
If you want the bucket to become equal size:
for c in df.columns:
arr = np.arange(8) + 1
arr = np.repeat(arr, int(len(df))/8) # your df has to be divisible by 8
np.random.shuffle(arr) # shuffle the array.
df['bucket_' + c] = arr