I'll try to write my problem in a list to be more understandable:
T
of size 1000x30
.1
to 20
.1
which means these rows are of "Class1" and some will have the value 2
and some will have the value 20
and so on. 1
but 10 rows have class 2
and 500 have class 3
and so on.This is what I want to do:
10
has the least rows assigned to it with count == 3
while the rest of classes have more than 3 rows assigned to them.YesNo
where it will have only the values 0
or 1
.1
.3
).YesNo
will be 1
while for the rest of the not chosen rows will be 0.1000
values, where 3*20 of them will have 1's (3->number of rows assigned to class with lowest count, and 20->is number of classes) and 0 for the rest.I wonder how this can be done in MATLAB R2015b? I know that I can create a new column in the table using T.YesNo = newArr;
where newArr
is a 1000x1 double
having 0
and 1
values.
As a small example, if T
is 10x3
and has only 3 classes (1,2,3
), below is how T
looks:
ID Name Class
0 'a' 3
1 'b' 2
2 'a' 2
3 'b' 2
4 'a' 3
5 'a' 1
6 'a' 1
7 'b' 2
8 'b' 1
9 'a' 2
So as shown above, Class3 is the one with the lowest count where only 2 rows. So I want to randomly select two rows of each Class1 and Class2 and then set the values of the new column of these randomly selected rows to 1
while the rest will be 0
as shown below:
ID Name Class YesNo
0 'a' 3 1
1 'b' 2 0
2 'a' 2 1
3 'b' 2 0
4 'a' 3 1
5 'a' 1 0
6 'a' 1 1
7 'b' 2 0
8 'b' 1 1
9 'a' 2 1
See code below. It should be self-explanatory. If something is unclear - please ask.
function q42944288
%% Definitions
MAX_CLASS = 20;
%% Setup
tmp = struct;
tmp.Data = rand(1000,1);
tmp.Class = uint8(randi(MAX_CLASS,1000,1)); % uint8 for efficiency
T = table(tmp.Data,tmp.Class,'VariableNames',{'Data','Class'});
%% Solution:
% Step 1:
[count,minVal] = min(histcounts(T.Class,'BinMethod','integers'));
% Steps 2+3:
T.YesNo = T.Class == minVal;
% Steps 4+5+6:
whichClass = bsxfun(@eq,T.Class,1:MAX_CLASS); % >=R2007a syntax
% whichClass = T.Class == 1:MAX_CLASS; % This is a logical array, >=R2016b syntax.
for indC = setdiff(1:MAX_CLASS,minVal)
inds = find(whichClass(:,indC));
T.YesNo(inds(randperm(numel(inds),count))) = true;
end
%% Test:
fprintf(1,'\nThe total number of classes is %d', numel(unique(T.Class)));
fprintf(1,'\nThe minimal count is %d',count);
fprintf(1,'\nThe total number of 1''s in T.YesNo is %d', sum(T.YesNo));