Search code examples
normalizationnormalize

How to normalize data for all parameters at same range of scale


I have csv file,
Data :

yield : 1172.4 , 1712.7 , 863.88 , 2731.34 , 5220

Rain(mm): 113.6 , 152.3 , 181.9 , 152.3 , 125.3

dummy(types_of_soil) : 1 , 0 , 0 , 2 , 1

dummy variable : 1 ==> Medium black soil

0 ==> deep black

2 ==> Radish brown

Dependent variable ( y) : yield

Independent variable : Rain , dummy(types of soil)

So, I want to normalize this data, How to scale data in range of 1 to 10 ?

I have try to use formula : (xi - min ) / (max - min ) Is it correct ?

and how to scale data for binary variable(dummy)?


Solution

  • you can use this code to normalize data

    import pandas
    import scipy
    import numpy
    from sklearn.preprocessing import MinMaxScaler
    url = "filename.csv"
    names = ['yield','Rain','types of soil']
    dataframe = pandas.read_csv(url, names=names)
    array = dataframe.values
    # separate array into input and output components
    X = array[:,]  **select x independent variable**
    Y = array[:,]  **select y dependent variable**
    scaler = MinMaxScaler(feature_range=(0, 1))
    rescaledX = scaler.fit_transform(X)
    # summarize transformed data
    numpy.set_printoptions(precision=3)
    print(rescaledX[0:5,:])
    

    for more details . see this link http://machinelearningmastery.com/prepare-data-machine-learning-python-scikit-learn/