Search code examples
matlabstatisticsnormalizationkolmogorov-smirnov

Kolmogorov-Smirnov test for normality in MATLAB - data normalisation?


I'm using the Kolmogorov-Smirnov test in MATLAB to determine the normality of each column of a data matrix prior to performing generalised linear regression. An example data vector is:

data = [8126,3163,9129,5399,8682,1126,1053,7805,2989,2758,3277,1152,6994,6833];

The test runs and gives me a result. However, when I plot the empirical cumulative distribution function (cdf) (blue) and the standard normal cdf (red) for a visual comparison, the scale of such a data vector is such that the graph is not useful:

exampleCDF

The code used to plot this figure is:

[h,p,ksstat,cv] = kstest(data);
[f,x_values] = ecdf(data);
figure()
F = plot(x_values,f);
set(F,'LineWidth',2);
hold on
G = plot(x_values,normcdf(x_values,0,1),'r-');
set(G,'LineWidth',2);
legend([F G],...
    'Empirical CDF','Standard Normal CDF',...
    'Location','SE');

Does this mean the result of my test is not valid? If yes, can I just normalise the data e.g.

dataN=(data-min(data))./(max(data)-min(data)); 

while maintaining test validity?

Thank you for your time,

Laura


Solution

  • Thanks to Luis Mendo I solved this problem. normcdf requires the mean and standard deviation of the data vector as inputs, which I had not changed from the example code I was working from. The edited code is:

    [h,p,ksstat,cv] = kstest(data);
    [f,x_values] = ecdf(data);
    figure()
    F = plot(x_values,f);
    set(F,'LineWidth',2);
    hold on
    variableMean = mean(data);
    variableSD = std(data);
    G = plot(x_values,normcdf(x_values,variableMean,variableSD),'r-');
    set(G,'LineWidth',2);
    legend([F G],...
        'Empirical CDF','Standard Normal CDF',...
        'Location','SE');