Search code examples
matlabcellconditional-statements

Delete rows from a cell given a specific condition


I have a cell type big-variable sorted out by FIRM (A(:,2)) and I want to erase all the rows in which the same firm doesn't appear at least 3 times in a row. In this example, A:

         FIRM 

 1997   'ABDR'  0,56    464 1641    19970224
 1997   'ABDR'  0,65    229 9208    19970424
 1997   'ABDR'  0,55    125 31867   19970218
 1997   'ABD'   0,06    435 8077    19970311
 1997   'ABD'   0,00    150 44994   19970804
1997    'ABFI'  2,07    154 46532   19971209

I would keep only A:

1997    'ABDR'  0,56    464 1641    19970224
1997    'ABDR'  0,65    229 9208    19970424
1997    'ABDR'  0,55    125 31867   19970218

Thanks a lot.

Notes:

I used fopen and textscanto import the csv file. I performed some changes on some variables for all of them to fit in a cell-type variable

I converted some number-elements into stings

F_x=num2cell(Data{:,x});

I got new variable just with year

F_ya=max(0,fix(log10(F_y)+1)-4);
F_yb=fix(F_y./10.^F_ya);
F_yc = num2cell(F_yb);

Create new cell A w/ variables I need

A=[F_5C Data{:,1} Data{:,2} Data{:,3} Data{:,4} F_xa F_xb];

Meaning that within the cell I have some variables that are strings and others that are numbers.


Solution

  • I'm going to assume that your names are stored in a cell array. As such, your names would actually be:

    names = {'ABDR', 'ABDR', 'ABDR', 'ABD', 'ABD', 'ABFI'};
    

    We can then use strcmpi. What this function does is that it string compares two strings together. It returns true if the strings match and false otherwise. This is also case insensitive, so ABDR would be the same as abdr.

    You would call strcmpi like so:

    v = strcmpi(str1, str2);
    

    Alternatively str2 can be a cell array. How this would work is that it would take a single string str1 and compare with each string in each cell of the cell array. It would then return a logical vector that is the same size as str2 which indicates whether we have a match at this particular location or not.

    As such, we can go through each element of names and see how many matches we have overall with the entire names cell array. We can then figure out which locations we need to select by checking to see if we have at least 3 matches or more per name in the names array. In other words, we simply sum up the logical vector for each string within names and filter those that sum up to 3 or more. We can use cellfun to help us perform this. As such:

    sums = cellfun(@(x) sum(strcmpi(x,names)), names);
    

    Doing this thus gives:

    sums =
    
     3     3     3     2     2     1
    

    Now, we need those locations that have three or more. As such:

    locations = sums >= 3
    
    locations =
    
     1     1     1     0     0     0
    

    As such, these are the rows that you can use to filter out your matrix. This is also a logical vector. Assuming that A contains your data, you would simply do A(locations,:) to filter out all those rows that have occurrences of three or more times for a particular name. I really don't know how you constructed A, so I'm assuming it's like a 2D matrix. If you put in the code that you used to construct this matrix, I'll modify my post to get it working for you. In any case, what's important is locations. This tells you what rows you need to select to match your criteria.