Search code examples
phpregexexpressiontext-parsing

Parse comma-separated text between parentheses as array of key-value pairs


I am trying to parse 1 line that is constructed in this format:

Files("textfile1.txt", 7268474425, "textfile2.txt", 661204928, "textfile3.txt", 121034)

I have this working perfectly in C# using named capture groups, but this is PHP and strictly on topic. So I have no idea how to separate each field and build a associative array I can iterate.

I can retrieve the first item in double-quotes "textfile1.txt" using

$string = 'Files("textfile1.txt", 7268474425, "textfile2.txt", 661204928, "textfile3.txt", 121034)';
preg_match("/(?:(?:\"(?:\\\\\"|[^\"])+\")|(?:'(?:\\\'|[^'])+'))/is", $string, $match);
print_r($match);
Array
(
    [0] => 'textfile1.txt'
)

I cant figure it out. I have tried different expressions to consider both the string/long fields but no luck.

Is there something I am missing?

End result is having each filename/size added to a array to access later.

Any help is appreciated

https://regex101.com/r/naSdng/1

My C# implementation looks like this:

MatchCollection result = Regex.Matches(file, @"(?:\G(?!\A)\s*,\s*|\w+\()(?:""(?<filename>.*?)""|'(?<filename>.*?)')\s*,\s*(?<filesize>\d+)");
matchCol = result;
foreach (Match match in result)
{
    ListViewItem ItemArray = new(new string[] {
        match.Groups["filename"].Value.Trim(), BytesToReadableString(Convert.ToInt64(match.Groups["filesize"].Value)), "Ready"
    });
    fileList.Items.Add(ItemArray);
}

Solution

  • The regex you have shown in C# can be easily adapted to work in PHP as well.

    You may use:

    (?:\w+\(\h*|(?<!\A)\G\h*,\h*)"(?<filename>[^"]+)"\h*,\h*(?<filesize>\d+)
    

    Note that I have refactored your regex a bit to make it more efficient.

    RegEx Demo

    Code Demo

    Code:

    <?php
    $s = 'Files("textfile1.txt", 7268474425, "textfile2.txt", 661204928, "textfile3.txt", 121034)';
    
    if (preg_match_all('/(?:\w+\(\h*|(?<!\A)\G\h*,\h*)"(?<filename>[^"]+)"\h*,\h*(?<filesize>\d+)/', $s, $m)) {
       $out = array_combine ( $m['filename'], $m['filesize'] );
       print_r($out);
    }
    ?>
    

    Output:

    Array
    (
        [textfile1.txt] => 7268474425
        [textfile2.txt] => 661204928
        [textfile3.txt] => 121034
    )
    

    RegEx Details:

    • (?:: Start a non-capture group
      • \w+\(\h*: Match 1+ word characters followed by ( and 0 or more whitespaces
      • |: OR
      • (?<!\A)\G: Start matching from end of the previous match
      • \h*,\h*: Match comma surrounded with 0 or more whitespaces
    • ): End non-capture group
    • "(?<filename>[^"]+)": Match double quoted string with named capture group filename to match 1+ of any char that is not a "
    • \h*,\h*: Match comma surrounded with 0 or more whitespaces
    • (?<filesize>\d+): Named capture group filesize to match 1+ digits