Search code examples
regexcsvextractd

Extracting values from comma separated lists


When given a list of comma separated values like 3, asdf, *#, 1212.3, I would like to extract each of these values, not including the comma, so I would have a value list like [3, "asdf", "*#", 1212.3] (not as a textual representation like that, but as an array of 'hits'). How would I do this?


Solution

  • First off, if you are dealing with CSV files, don't use regex or your own parser. Basically when you think things are simple they really aren't, Stop Rolling Your Own CSV Parser.

    Next up, you say that you would like to have an array ([3, "asdf", "*#", 1212.3]). This looks to be mixing types and can not be done in a static language. And ultimately is very inefficient even using std.variant. For each parsed value you'd have code like:

    try {
        auto data = to!double(parsedValue);
        auto data2 = to!int(data);
        if(data == data2)
            returnThis = Variant(data2);
        else
            returnThis = Variant(data);
    } catch(ConvException ce) { }
    

    Now if your data is truely separated by some defined set of characters, and isn't broken into records with new lines, then you can use split(", ") from std.algorithm. Otherwise use a CSV parser. If you don't want to follow the standard wrap the parser so the data is what you desire. In your example you have spaces, which are not to be ignored by the CSV format, so call strip() on the output.

    In the article I linked it mentions that what commonly happens is that people will write a parser in its simplest form and not handle the more complicated cases. So when you look for a CSV parser you'll find many that just don't cut it. This writing your own parser comes up, which I say is fine just handle all valid CSV files.

    Luckily you don't need to write your own as I reciently made a CSV Parser for D. Error checking isn't done currently, I don't know the best way to report issues such that parsing can be corrected and continue. Usage examples are found in the unittest blocks. You can parse over a struct too:

    struct MyData {
        int a;
        string b;
        string c;
        double d
    }
    
    foreach(data; csv.csv!MyData(str)) // I think I'll need to change the module/function name
        //...