Updating N-gram 2 dimension cell array in Matlab

I am trying to extract bi-grams from a set of words and store them in a matrix. what I want is to insert the word in the first raw and all the bi-grams related to that word

for example: if I have the following string 'database file there' my output should be:

database   file  there
da         fi    th
at         il    he
ta         le    er
ab               re
..

I have tried this but it gives me only the bigram without the original word

collection = fileread('e:\m.txt');
collection = regexprep(collection,'<.*?>','');
collection = lower(collection);
collection = regexprep(collection,'\W',' ');
collection = strtrim(regexprep(collection,'\s*',' '));
temp = regexprep(collection,' ',''',''');
eval(['words = {''',temp,'''};']);

word = char(words(1));
word2 =  regexp(word, sprintf('\\w{1,%d}', 1), 'match');     
bi = cellfun(@(x,y) [x '' y], word2(1:end-1)', word2(2:end)','un',0);

this is only for the first word however, i want to do that for every word in the "words" matrix 1X1000

is there an efficient way to accomplish this as I will deal with around 1 million words?

I am new to Matlab and if there any resource to explain how to deal with matrix (update elements, delete, ...) will be helpful

regards, Ashraf

Solution

If you were looking to get a cell array as the output, this might work for you -

input_str = 'database file there' %// input

str1_split = regexp(input_str,'\s','Split'); %// split words into cells
NW = numel(str1_split); %// number of words
char_arr1 = char(str1_split'); %//' convert split cells into a char array
ind1 = bsxfun(@plus,[1:NW*2]',[0:size(char_arr1,2)-2]*NW); %//' get indices
                                           %// to be used for indexing into char array
t1 = reshape(char_arr1(ind1),NW,2,[]);
t2 = reshape(permute(t1,[2 1 3]),2,[])'; %//' char array with rows for each pair

out = reshape(mat2cell(t2,ones(1,size(t2,1)),2),NW,[])'; %//'
out(reshape(any(t2==' ',2),NW,[])')={''}; %//' Use only paired-elements cells
out = [str1_split ; out] %// output

Code Output -

input_str =
database file there

out = 
    'database'    'file'    'there'
    'da'          'fi'      'th'   
    'at'          'il'      'he'   
    'ta'          'le'      'er'   
    'ab'          ''        're'   
    'ba'          ''        ''     
    'as'          ''        ''     
    'se'          ''        ''