I am writing a code to do some very simple descriptive statistics, but I found myself being very repetitive with my syntax. I know there's a way to shorten this code and make it more elegant and time efficient with something like a for-loop, but I am not quite keen enough in coding (yet) to know how to do this...
I have three variables, or groups (All data, condition 1, and condition 2). I also have 8 matlab functions that I need to perform on each of the three groups (e.g mean, median). I am saving all of the data in a table where each column corresponds to one of the functions (e.g. mean) and each row is that function performed on the correspnding group (e.g. (1,1) is mean of 'all data', (2,1) is mean of 'cond 1', and (3,1) is mean of 'cond 2'). It is important to preserve this structure as I am outputting to a csv file that I can open in excel. The columns, again, are labeled according the function, and the rows are ordered by 1) all data 2) cond 1, and 3) cond 2.
The data I am working with is in the second column of these matrices, by the way.
So here is the tedious way I am accomplishing this:
x = cell(3,8);
x{1,1} = mean(alldata(:,2));
x{2,1} = mean(cond1data(:,2));
x{3,1} = mean(cond2data(:,2));
x{1,2} = median(alldata(:,2));
x{2,2} = median(cond1data(:,2));
x{3,2} = median(cond2data(:,2));
x{1,3} = std(alldata(:,2));
x{2,3} = std(cond1data(:,2));
x{3,3} = std(cond2data(:,2));
x{1,4} = var(alldata(:,2)); % variance
x{2,4} = var(cond1data(:,2));
x{3,4} = var(cond2data(:,2));
x{1,5} = range(alldata(:,2));
x{2,5} = range(cond1data(:,2));
x{3,5} = range(cond2data(:,2));
x{1,6} = iqr(alldata(:,2)); % inter quartile range
x{2,6} = iqr(cond1data(:,2));
x{3,6} = iqr(cond2data(:,2));
x{1,7} = skewness(alldata(:,2));
x{2,7} = skewness(cond1data(:,2));
x{3,7} = skewness(cond2data(:,2));
x{1,8} = kurtosis(alldata(:,2));
x{2,8} = kurtosis(cond1data(:,2));
x{3,8} = kurtosis(cond2data(:,2));
% write output to .csv file using cell to table conversion
T = cell2table(x, 'VariableNames',{'mean', 'median', 'stddev', 'variance', 'range', 'IQR', 'skewness', 'kurtosis'});
writetable(T,'descriptivestats.csv')
I know there is a way to loop through this stuff and get the same output in a much shorter code. I tried to write a for-loop but I am just confusing myself and not sure how to do this. I'll include it anyway so maybe you can get an idea of what I'm trying to do.
x = cell(3,8);
data = [alldata, cond2data, cond2data];
dfunction = ['mean', 'median', 'std', 'var', 'range', 'iqr', 'skewness', 'kurtosis'];
for i = 1:8,
for y = 1:3
x{y,i} = dfucntion(i)(data(1)(:,2));
x{y+1,i} = dfunction(i)(data(2)(:,2));
x{y+2,i} = dfunction(i)(data(3)(:,2));
end
end
T = cell2table(x, 'VariableNames',{'mean', 'median', 'stddev', 'variance', 'range', 'IQR', 'skewness', 'kurtosis'});
writetable(T,'descriptivestats.csv')
Any ideas on how to make this work??
You want to use a cell array of function handles. The easiest way to do that is to use the @ operator, as in
dfunctions = {@mean, @median, @std, @var, @range, @iqr, @skewness, @kurtosis};
Also, you want to combine your three data variables into one variable, to make it easier to iterate over them. There are two choices I can see. If your data variables are all M-by-2 in dimension, you could concatenate them into a M-by-2-by-3 three-dimensional array. You could do that with
data = cat(3, alldata, cond1data, cond2data);
The indexing expression into data that retrieves the values you want would be data(:, 2, y)
. That said, I think this approach would have to copy a lot of data around and probably isn't the best for performance. The other way to combine data together is in 1-by-3 cell array, like this:
data = {alldata, cond1data, cond2data};
The indexing expression into data that retrieves the values you want in this case would be data{y}(:, 2)
.
Since you are looping from y == 1 to y == 3, you only need one line in your inner loop body, not three.
for y = 1:3
x{y, i} = dfunctions{i}(data{y}(:,2));
end
Finally, to get the cell array of strings containing function names to pass to cell2table, you can use cellfun to apply func2str to each element of dfunctions:
funcnames = cellfun(@func2str, dfunctions, 'UniformOutput', false);
The final version looks like this:
dfunctions = {@mean, @median, @std, @var, @range, @iqr, @skewness, @kurtosis};
data = {alldata, cond1data, cond2data};
x = cell(length(data), length(dfunctions));
for i = 1:length(dfunctions)
for y = 1:length(data)
x{y, i} = dfunctions{i}(data{y}(:,2));
end
end
funcnames = cellfun(@func2str, dfunctions, 'UniformOutput', false);
T = cell2table(x, 'VariableNames', funcnames);
writetable(T,'descriptivestats.csv');