Search code examples
matlabwekagaussian

Generating dataset with mean, std dev, and number of samples


I am trying to generate a 2D data set with the following parameters:

x= N(-5,1) y= N(0,1) n= 1000

Where N(mean, std dev) and n = number of samples.

I tried:

x = normrnd(-5, 1, [100,10]) 
y = normrnd(0,1,[100,10])

to generate a 100 x 10 array with the appropriate values. I now need to find a way to output the values from these two arrays into an N(x,y) format that can be analyzed by Weka. Any suggestions on how to do this would be appreciated.


Solution

  • Given your comments, you want to generate a N x 2 matrix where each row is a pair of values that both come from different normal distributions.

    You can either generate the 2D matrices of each separately and unroll them into single vectors and concatenate them both.... or the simplest way is to just generate 100 x 10 = 1000 elements in a 1D vector from each distribution and concatenate these together.

    Method #1 - 2D matrix unrolling

    x = normrnd(-5, 1, [100,10]);
    y = normrnd(0, 1, [100,10]);
    
    N = [x(:) y(:)];
    

    Method #2 - 1D vector concatenation

    x = normrnd(-5, 1, [1000,1]); %// Change
    y = normrnd(0, 1, [1000,1]); %// Change
    
    N = [x y];
    

    If you wish to write this to a CSV file, where you have a pair of x,y values separated by a comma and you have Class_A at the end, a call to fopen to open up a file for writing, fwrite to write our stuff to the file and fclose to finally close the file is needed. You also require that the digits are 3 digits of precision. Something like this comes to mind:

    f = fopen('numbers.csv', 'w'); %// Open up the file
    fprintf(f,'%.3f,%.3f,Class_A\n', N.'); %'// Write the data
    fclose(f); %// Close the file
    

    It's important to look at the second statement carefully. Note that I'm writing the transpose of N because MATLAB writes values in column-major order. This means that if you want the rows to be written to the file, you have to transpose the matrix to do that. numbers.csv is what the file is called when it is written. If you examine this file now, you'll see that it's in the form of x,y,Class_A where x,y is a pair of values from both normal distributions.