Search code examples
matlabindexingcellstrcmp

Percentage calculation of answers in a Matlab Cell - (multiple answer, single row)


I am using MATLAB for statistical analysis and I have a small issue. I need to calculate the percentage of correct answers for a particular question. I stored the answers in a cell. Here;

mySturct.nm_answers=
'${e://Field/n11},${e://Field/n99},${e://Field/n147}, Sam, Thomas' % = participant1
 NaN % = participant 2
''${e://Field/n3},${e://Field/n11},${e://Field/n43},${e://Field/n59},${e://Field/n83},${e://Field/n91},${e://Field/n99},${e://Fiel...'' <Preview truncated at 128 characters>'          % = participant 3
''${e://Field/n11},${e://Field/n19},${e://Field/n43},${e://Field/n59},${e://Field/n67},${e://Field/n83},${e://Field/n107},${e://Fi...'' <Preview truncated at 128 characters>' %= participant 4
 ...
% goes until participant 150

Each row of the cell represents a participant's answers. In this preview, there are 4 participants. It seems messy, I know, because I have recorded all answers in a row. (I had a multiple choice question with 40 options and every selection has been recorded in the first row.)

I have 20 wrong and 20 correct selections, so I have 40 different options in my multiple choice question. Every answer that starts with ${e://Field/ will be considered as a correct answer and every name such as Sam, Thomas (check participant1) will be considered as a wrong answer.

Also, I am going to count unselected options as well. Therefore, 20- # of correct answers will be considered as "should have selected" and 20- #of wrong answers will be considered as "should have not selected".

I need to calculate each participant's correct answer rate.

It is going to be like = (# of should have not be selected + # of correct answer)/40.

I could not use the find function to get the number of each condition (correct,wrong. should have selected ...) It gives error since it is a cell.

 correctansw=lentgh(find(myStruct.nm_answers= '${e://Field/n'));

    Undefined operator '==' for input arguments of type 'cell'.

Also, I am not able to use strcmp function since every answer is stored in a single (row,column).

What should I do?

MY ANSWER

I combined both of the answers that I get and here is my code for this issue;

numberCorrect = cellfun(@(x) length(strfind(x, 
'e://Field/')),myStruct.nm_answers); %correct answers

numberanswers = cellfun(@(x) length(strfind(x, ',')),myStruct.nm_answers)+1;
%all answers

numberanswers(7,1)=0; , numberanswers(15,1)=0; ...
... % since I did +1, NaNs = 1...
numberofUncorrect = numberanswers-numberCorrect;
correctunticks= 20- numberofUncorrect;

myStruct.nm_perc= (correctunticks+numberCorrect)/40 ;

myStruct.nm_perc(7,1)= NaN;
myStruct.nm_perc(15,1)= NaN;
myStruct.nm_perc(38,1)= NaN;
myStruct.nm_perc(74,1)= NaN;
myStruct.nm_perc(105,1)= NaN;

clear numberanswers numberCorrect numberofUncorrect correctunticks

Since I had only 5 NaNs, I was able to do it manually,but in the future I will use @TomasoBelluzzo 's code for NaNs. It is more neat and quicker way!


Solution

  • The count function might be what you are looking for. The pattern of correct answers is linear and easy to catch with a text search, so it's better to focus on that one instead of trying to detect the wrong answers with a regular expression.

    The excerpt you posted is a little bit messed up and hard to read, at least from my phone... but let's suppose that your answers are structured as a row vector of cells whose underlying values are character arrays (we will call that variable answers, for the sake of simplicity), then:

    answers_total = count(answers,',') + 1;
    answers_correct = count(answers,'${e://Field/n');
    % answers_wrong = answers_total - answers_correct;
    
    ratio = (answers_correct ./ answers_total) .* 100;
    

    The ratio variable will be a row vector of double values in which every row represents the percentage of correct answers provided by a specific participant, following the order defined in your data.

    The code can handle a different number of answers provided by each participant without problems.

    EDIT

    I just noticed there can be NaNs in your variables. I suppose they represents the participants that... well, that didn't participate. I recommend you to avoid mixing variable types like this, especially if you want to develop a computational approach as standardized as possible... they just make everything more complicated to handle. Replace them with empty strings so that my solution can be adjusted accordingly:

    answers_empty = cellfun(@isempty,answers);
    
    answers_total = count(answers,',');
    answers_total(~answers_empty) = answers_total(~answers_empty) + 1;
    
    answers_correct = count(answers,'${e://Field/n');
    
    ratio = (answers_correct ./ answers_total) .* 100;
    ratio(answers_empty) = 0;