Search code examples
matlabcsvtextscan

How to convert a string to a table with `textscan`?


I'm using matlab to read in COVID-19 data provided by Johns Hopkins as a .csv-file using urlread, but I'm not sure how to use textscan in the next step in order to convert the string into a table. The first two columns of the .csv-file are strings specifying the region, followed by a large number of columns containing the registered number of infections by date.

Currently, I just save the string returned by urlread locally and open this file with importdata afterwards, but surely there should be a more elegant solution.


Solution

  • You have mixed-up two things: Either you want to read from the downloaded csv-file using ´textscan´ (and ´fopen´,´fclose´ of course), or you want to use ´urlread´ (or rather ´webread´ as MATLAB recommends not to use ´urlread´ anymore). I go with the latter, since I have never done this myself^^

    So, first we read in the data and split it into rows

    url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv";
    % read raw data as single character array
    web = webread(url);
    % split the array into a cell array representing each row of the table
    row = strsplit(web,'\n');
    

    Then we allocate a table (pre-allocation is good for MATLAB as it stores variables on consecutive addresses in the RAM, so tell MATLAB beforehand how much space you need):

    len = length(row);
    % get the CSV-header as information about the number of columns
    Head = strsplit(row{1},',');
    % allocate table 
    S = strings(len,2);
    N = NaN(len,length(Head)-2);
    T = [table(strings(len,1),strings(len,1),'VariableNames',Head(1:2)),...
        repmat(table(NaN(len,1)),1,length(Head)-2)];
    % rename columns of table
    T.Properties.VariableNames = Head;
    

    Note that I did a little trick to allocate so many reparate columns of ´NaN´s by repeating a single table. However, concatnating this table with the table of strings is difficult as both contain the column-names var1 and var2. That is why I renamed the column of the first table right away.

    Now we can actually fill the table (which is a bit nasty due to the person who found it nice to write ´Korea, South´ into a comma-separated file)

    for i = 2:len
        % split this row into columns
        col = strsplit(row{i},',');
        % quick conversion
        num = str2double(col);
    
        % keep strings where the result is NaN
        lg = isnan(num);
        str = cellfun(@string,col(lg)); 
        T{i,1} = str(1);
        T{i,2} = strjoin(str(2:end));% this is a nasty workaround necessary due to "Korea, South"
        T{i,3:end} = num(~lg);
    end
    

    This should also work for the days that are about to come. Let me know what you actually gonna do with the data