Search code examples
matlabcell-array

Pull specific cells out of a cell array by comparing the last digit of their filename


I have a cell array of filenames - things like '20160303_144045_4.dat', '20160303_144045_5.dat', which I need to separate into separate arrays by the last digit before the '.dat'; one cell array of '...4.dat's, one of '...5.dat's, etc.

My code is below; it uses regex to split the file around the '.dat', reshapes a bit then regexes again to pull out the last number of the filename, builds a cell to store the filenames in then, and then I get a tad stuck. I have an array produced such as '1,0,1,0,1,0..' of required cell indexes which I thought might be trivial to pull out, but I'm struggling to get it to do what I want.

numFiles = length(sampleFile); %sampleFile is the input cell array

splitFiles = regexp(sampleFile,'.dat','split');
column = vertcat(splitFiles{:});
column = column(:,1);

splitNums = regexp(column,'_','split');
splitNums = splitNums(:,1);
column = vertcat(splitNums{:});
column = column(:,3);

column = cellfun(@str2double,column); %produces column array of values - 3,4,3,4,3,4, etc

uniqueVals = unique(column);
numChannels = length(uniqueVals);


fileNameCell = cell(ceil(numFiles/numChannels),numChannels);

for i = 1:numChannels

   column(column ~= uniqueVals(i)) = 0;
   column = column / uniqueVals(i); %e.g. 1,0,1,0,1,0

   %fileNameCell(i) 
end

I feel there should be an easier way than my hodge-podge of code, and I don't want to throw together a ton of messy for-loops if I can avoid it; I definitely believe I've overcomplicated this problem massively.


Solution

  • We can neaten your code quite a bit.

    Take some example data:

    files = {'abc4.dat';'abc5.dat';'def4.dat';'ghi4.dat';'abc6.dat';'def5.dat';'nonum.dat'};
    

    You can get the final numbers using regexp and matching one or more digits followed by '.dat', then using strrep to remove the '.dat'.

    filenums = cellfun(@(r) strrep(regexp(r, '\d+.dat', 'match', 'once'), '.dat', ''), ...
                       files, 'uniformoutput', false);
    

    Now we can put these in a structure, using the unique numbers (prefixed by a letter because fields can't start with numbers) as field names.

    % Get unique file numbers and set up the output struct
    ufilenums = unique(filenums);
    filestruct = struct;
    % Loop over file numbers
    for ii = 1:numel(ufilenums)
        % Get files which have this number
        idx = cellfun(@(r) strcmp(r, ufilenums{ii}), filenums);
        % Assign the identified files to their struct field
        filestruct.(['x' ufilenums{ii}]) = files(idx);
    end
    

    Now you have a neat output

    % Files with numbers before .dat given a field in the output struct
    filestruct.x4 = {'abc4.dat' 'def4.dat' 'ghi4.dat'}
    filestruct.x5 = {'abc5.dat' 'def5.dat'}
    filestruct.x6 = {'abc6.dat'}
    % Files without numbers before .dat also captured
    filestruct.x =  {'nonum.dat'}