Search code examples
regexmatlabtextcell-array

Extract specific data from a text file


I have a txt file that appears in notepad++ like this:

/a/apple 1
/b/bat 10
/c/cat 22
/d/dog 33
/h/human/female 34

Now I want to extract everything after second slash before the numbers at the end. So the output I want is:

out = {'apple'; 'bat'; 'cat'; 'dog'; 'human/female'}

I wrote this code:

file= fopen('file.txt');
out=  textscan(file,'%s','Delimiter','\n');
fclose(file);

it gives:

out =
   {365×1 cell}

out{1} = 

    '/a/apple 1'
    '/b/bat 10'
    '/c/cat 22'
    '/d/dog 33'
    '/h/human/female 34'

How can I get the required output from the text file (directly if possible)? Or any regular expression if directly getting the required output is not possible?


Solution

  • You can get the desired output directly from textscan, without any further processing needed:

    file = fopen('file.txt');
    out = textscan(file, '/%c/%s %d');
    fclose(file);
    out = out{2}
    
    out =
    
      5×1 cell array
    
        'apple'
        'bat'
        'cat'
        'dog'
        'human/female'
    

    Note that the two slashes in the format specifier string will be treated as literal text to ignore in the output. Any additional slashes will be captured in the string (%s). Also, it is unnecessary to specify a delimiter argument since the default delimiter is whitespace, so the trailing number will be captured as a separate numeric value (%d).