Search code examples
python-3.xmachine-learningstatisticsnormalizationnormal-distribution

how do i check if a data set is normal or not in python?


So I'm creating a master program for machine learning from scratch in python and the first step i want to do is to check if the data set is normal or not. ps : the data set can have many features or just a single feature.

It has to be implemented in python3.

also, normalizing the data can be done by the below function right :

# Find the min and max values for each column
def dataset_minmax(dataset):
    minmax = list()
    for i in range(len(dataset[0])):
        col_values = [row[i] for row in dataset]
        value_min = min(col_values)
        value_max = max(col_values)
        minmax.append([value_min, value_max])
    return minmax

# Rescale dataset columns to the range 0-1
def normalize_dataset(dataset, minmax):
    for row in dataset:
        for i in range(len(row)):
            row[i] = (row[i] - minmax[i][0]) / (minmax[i][1] - minmax[i][0])

THANKS IN ADVANCE!


Solution

  • Your question seems discordant: if your features are not coming from a normal distribution, you cannot "normalize" them, in the sense of changing their distribution. If you mean to check if they have average 0 and SD of 1 that is a different ballpark game.