Search code examples
data-sciencedata-cleaning

how to do binning in discretization method and label it high low i have done this following method is there any short cut for binning and label itt


iris dataset

data.describe()

#WE USE DISCRETIZATION BECAUSE IT CONVERT CONTINUOUS DATA INTO DICRETE DATA #WE DOING DISTRETIZATION FOR EACH COLUMN data['Sepal.Length'] = pd.cut(data['Sepal.Length'], bins = [data['Sepal.Length'].min(), data['Sepal.Length'].mean(), data['Sepal.Length'].max()], labels = ["low","high"])

data['Sepal.Width'] = pd.cut(data['Sepal.Width'], bins = [data['Sepal.Width'].min(), data['Sepal.Width'].mean(), data['Sepal.Width'].max()], labels = ["low","high"])

data['Petal.Length'] = pd.cut(data['Petal.Length'], bins = [data['Petal.Length'].min(), data['Petal.Length'].mean(), data['Petal.Length'].max()], labels = ["low","high"])

data['Petal.Width'] = pd.cut(data['Petal.Width'], bins = [data['Petal.Width'].min(), data['Petal.Width'].mean(), data['Petal.Width'].max()], labels = ["low","high"])


#is there any method or short cut for this or by using for loop to discretized all columns at once



Solution

  • cols1 = ['Petal.Width','Petal.Length','Sepal.Width','Sepal.Length']
     
    for i in cols1:
        data[i] = pd.cut(data[i], bins = [data[i].min(), data[i].mean(), data[i].max()],labels = ["low","high"])
    

    try to do it using a for loop