Search code examples
regexmatlabcell-array

Character count of regular expression in cells in MATLAB


Earlier I got some help as to how to make a script that will extract hashtags from a list of tweets and put them into an array of cells. I used this as my code, inside a for loop

hashtagCell{i} = regexp(textRead{i}, '#[A-z]*', 'match');

This works for what it is supposed to do, but now I'm trying to find the average character length of the hashtags, so I need to be able to add the character length of each hashtag pulled out by the above function and add them together. However, when I try to use the size() function, it just gives me the size of the cell instead of the size of the strings, which is what I want. I can't figure out how to do this.


Solution

  • This should help (and it gets rid of any loops, other than, perhaps, the one used to create CellOfText):

    %# Example cell array of tweets
    CellOfText = {'Bah #humbug says #Mr scrooge'; 'No #presents for you'};
    
    %# Get all hash tags
    HTC = regexp(CellOfText, '#[A-z]*', 'match');
    
    %# Get the average hash tag length, being careful to unnest HTC
    AvgLength1 = mean(cellfun('length', [HTC{:}]));
    

    DISCLAIMER: The inspiration for this method came from this excellent answer to a similar question. Thanks to @Andrey for that.