I am using MATLAB for statistical analysis and I have a small issue. I need to calculate the percentage of correct answers for a particular question. I stored the answers in a cell. Here;
mySturct.nm_answers=
'${e://Field/n11},${e://Field/n99},${e://Field/n147}, Sam, Thomas' % = participant1
NaN % = participant 2
''${e://Field/n3},${e://Field/n11},${e://Field/n43},${e://Field/n59},${e://Field/n83},${e://Field/n91},${e://Field/n99},${e://Fiel...'' <Preview truncated at 128 characters>' % = participant 3
''${e://Field/n11},${e://Field/n19},${e://Field/n43},${e://Field/n59},${e://Field/n67},${e://Field/n83},${e://Field/n107},${e://Fi...'' <Preview truncated at 128 characters>' %= participant 4
...
% goes until participant 150
Each row of the cell represents a participant's answers. In this preview, there are 4 participants. It seems messy, I know, because I have recorded all answers in a row. (I had a multiple choice question with 40 options and every selection has been recorded in the first row.)
I have 20 wrong and 20 correct selections, so I have 40 different options in my multiple choice question. Every answer that starts with ${e://Field/
will be considered as a correct answer and every name such as Sam
, Thomas
(check participant1
) will be considered as a wrong answer.
Also, I am going to count unselected options as well. Therefore, 20- # of correct answers will be considered as "should have selected" and 20- #of wrong answers will be considered as "should have not selected".
I need to calculate each participant's correct answer rate.
It is going to be like = (# of should have not be selected + # of correct answer)/40.
I could not use the find
function to get the number of each condition (correct,wrong. should have selected ...) It gives error since it is a cell.
correctansw=lentgh(find(myStruct.nm_answers= '${e://Field/n'));
Undefined operator '==' for input arguments of type 'cell'.
Also, I am not able to use strcmp
function since every answer is stored in a single (row,column).
What should I do?
MY ANSWER
I combined both of the answers that I get and here is my code for this issue;
numberCorrect = cellfun(@(x) length(strfind(x,
'e://Field/')),myStruct.nm_answers); %correct answers
numberanswers = cellfun(@(x) length(strfind(x, ',')),myStruct.nm_answers)+1;
%all answers
numberanswers(7,1)=0; , numberanswers(15,1)=0; ...
... % since I did +1, NaNs = 1...
numberofUncorrect = numberanswers-numberCorrect;
correctunticks= 20- numberofUncorrect;
myStruct.nm_perc= (correctunticks+numberCorrect)/40 ;
myStruct.nm_perc(7,1)= NaN;
myStruct.nm_perc(15,1)= NaN;
myStruct.nm_perc(38,1)= NaN;
myStruct.nm_perc(74,1)= NaN;
myStruct.nm_perc(105,1)= NaN;
clear numberanswers numberCorrect numberofUncorrect correctunticks
Since I had only 5 NaNs, I was able to do it manually,but in the future I will use @TomasoBelluzzo 's code for NaNs. It is more neat and quicker way!
The count function might be what you are looking for. The pattern of correct answers is linear and easy to catch with a text search, so it's better to focus on that one instead of trying to detect the wrong answers with a regular expression.
The excerpt you posted is a little bit messed up and hard to read, at least from my phone... but let's suppose that your answers are structured as a row vector of cells whose underlying values are character arrays (we will call that variable answers
, for the sake of simplicity), then:
answers_total = count(answers,',') + 1;
answers_correct = count(answers,'${e://Field/n');
% answers_wrong = answers_total - answers_correct;
ratio = (answers_correct ./ answers_total) .* 100;
The ratio
variable will be a row vector of double values in which every row represents the percentage of correct answers provided by a specific participant, following the order defined in your data.
The code can handle a different number of answers provided by each participant without problems.
EDIT
I just noticed there can be NaNs
in your variables. I suppose they represents the participants that... well, that didn't participate. I recommend you to avoid mixing variable types like this, especially if you want to develop a computational approach as standardized as possible... they just make everything more complicated to handle. Replace them with empty strings so that my solution can be adjusted accordingly:
answers_empty = cellfun(@isempty,answers);
answers_total = count(answers,',');
answers_total(~answers_empty) = answers_total(~answers_empty) + 1;
answers_correct = count(answers,'${e://Field/n');
ratio = (answers_correct ./ answers_total) .* 100;
ratio(answers_empty) = 0;