Search code examples
svgluavector-graphicslua-patterns

Parse SVG path definition ("d") in Lua


I have path definition in this form (example):

<path d="M 20 30 L 20 20 20 40 40 40"/>

Which, in Lua, becomes:

"M 20 30 L 20 20 20 40 40 40"

How could I parse it in pure Lua to get something like:

{'M', 20, 30, 'L', 20, 20, 20, 40, 40, 40 }

Or, perfectly:

{{'M', 20, 30}, {'L', 20, 20}, {'L', 20, 40}, {'L', 40, 40}}

Does Lua patterns have such capabilities?

EDIT: I want to cover all valid SVG paths, or at least that Inkscape-generated. specification inkscape-generated path


Solution

  • Not directly, you would need a simplified parser around that of course.

    Curiosity got the better of me, though usually I dislike "Do this work for me" posts

    --- Parse svg `path` attribute into 2-D array
    function parsePath(input)
        local output, line = {}, {};
        output[#output+1] = line;
    
        input = input:gsub("([^%s,;])([%a])", "%1 %2"); -- Convert "100D" to "100 D"
        input = input:gsub("([%a])([^%s,;])", "%1 %2"); -- Convert "D100" to "D 100"
        for v in input:gmatch("([^%s,;]+)") do
            if not tonumber(v) and #line > 0 then
                line = {};
                output[#output+1] = line;
            end
            line[#line+1] = v;
        end
        return output;
    end
    
    -- Test output
    local input = 'M20 30L20 20,20 40;40 40 X1 2 3 12.8z';
    local r = parsePath(input);
    for i=1, #r do
        print("{ "..table.concat(r[i], ", ").." }");
    end
    

    Since Inkscape always seems to put a space between instructions and numbers, you could leave out the two gsub lines if you only parse files generated by Inkscape.

    The function also throws away most random characters Inkscape likes to put into a path definition, however there may be some details for you to resolve if you really want to read all path definitions that conform to the standard.

    Update (after skimming over the SVG BNF definition)

    The SVG Standard states Superfluous white space and separators such as commas can be eliminated, however looking at the BNF notation I could not find any other separator than whitespace and comma.

    So you could probably change the 2nd regular expression to "([^%a%d%.eE-]+)". But I figured that the following function would fit a lot better:

    function parsePath(input)
        local out = {};
    
        for instr, vals in input:gmatch("([a-df-zA-DF-Z])([^a-df-zA-DF-Z]*)") do
            local line = { instr };
            for v in vals:gmatch("([+-]?[%deE.]+)") do
                line[#line+1] = v;
            end
            out[#out+1] = line;
        end
        return out;
    end
    
    -- Test output
    local input = 'M20-30L20,20,20X40,40-40H1,2E1.7 1.8e22,3,12.8z';
    local r = parsePath(input);
    for i=1, #r do
        print("{ "..table.concat(r[i], ", ").." }");
    end
    

    This function is quite lenient in that it allows any unnecessary white space to be left out and does not validate any semantics other than that it will discard any data before the first letter that is not e or E.

    It will also silently ignore any non-matching data.

    If you want to only match existing instructions, you can replace the pattern ([a-df-zA-DF-Z])([^a-df-zA-DF-Z]*) with ([MmZzLlHhVvCcSsQqTtAa])([^MmZzLlHhVvCcSsQqTtAa]*). However this will cause all values of a non-existing instruction to be added to the previous instruction, so I do not think this is a good idea, better to parse a superset and throw errors on semantics later.