Search code examples
matlabremoving-whitespacetextscan

textscan stops reading after 1 line


I'm using the following command to read a csv file:

fid=fopen('test.csv');
scannedData = textscan(fid, '%4.0u%2.0u%2.0u%2.0u%2.0u%2.0u,%u,%u,%q,%q,%f,%f,%.2f,%u','whitespace','"');
fclose(fid);

The problem is that textscan doesn't read the value from the last field and stops after 1 line. Skipping that field, assign it a different type, using numerous eof combinations in the textscan, nothing helped.

The data in the file looks like this :

"20100324072328","501","1","str1","str2","4.6846712","52.0159507","1.250000","128.000000"
"20100324072519","501","1","str1","str2","4.6846122","52.0159346","0.000000","128.000000"
"20100324072640","501","1","str1","str2","4.6846014","52.0159453","0.000000","128.000000"
"20100324072812","501","1","str1","str2","4.6845907","52.0159507","0.000000","96.000000"
"20100324073002","501","1","str1","str2","4.6845800","52.0159614","0.000000","128.000000"

I'd like to parse the first filed directly with textscan as I'm trying with the above commands.

I don't want to use the alternative of reading the fields with %q and then parsing the resulting arrays.

So, I would appreciate any suggestions to make textscan do it all in one go.

Thanks.


Solution

  • If you want to consider " as whitespace, then you should not use %q which needs the double quotes to identify the full string and cannot find them if you consider them whitespace:

    fid = fopen('test.txt');
    fmt = '%4u%2u%2u%2u%2u%2u%u%u%s%s%f%f%f%u';
    out = textscan(fid,fmt,'Delimiter',',','Whitespace','"')
    fclose(fid)
    

    Alternatively I was suggesting in the comments to use:

    fmt = '"%4u%2u%2u%2u%2u%2u" "%u" "%u"%q%q"%f" "%f" "%f" "%u"';
    out = textscan(fid,fmt,'Delimiter',',')
    

    note how I space " ", otherwise textscan() cannot recognize when fields really end.

    However, I would personally might go for explicit date conversion to serial date

    fmt = '%s%u%u%s%s%f%f%f%u';
    out = textscan(fid,fmt,'Delimiter',',','Whitespace','"')
    out{1} =  datenum(out{1},'yyyymmddHHMMSS');