I am trying to parse 1 line that is constructed in this format:
Files("textfile1.txt", 7268474425, "textfile2.txt", 661204928, "textfile3.txt", 121034)
I have this working perfectly in C# using named capture groups, but this is PHP and strictly on topic. So I have no idea how to separate each field and build a associative array I can iterate.
I can retrieve the first item in double-quotes "textfile1.txt" using
$string = 'Files("textfile1.txt", 7268474425, "textfile2.txt", 661204928, "textfile3.txt", 121034)';
preg_match("/(?:(?:\"(?:\\\\\"|[^\"])+\")|(?:'(?:\\\'|[^'])+'))/is", $string, $match);
print_r($match);
Array
(
[0] => 'textfile1.txt'
)
I cant figure it out. I have tried different expressions to consider both the string/long fields but no luck.
Is there something I am missing?
End result is having each filename/size added to a array to access later.
Any help is appreciated
https://regex101.com/r/naSdng/1
My C# implementation looks like this:
MatchCollection result = Regex.Matches(file, @"(?:\G(?!\A)\s*,\s*|\w+\()(?:""(?<filename>.*?)""|'(?<filename>.*?)')\s*,\s*(?<filesize>\d+)");
matchCol = result;
foreach (Match match in result)
{
ListViewItem ItemArray = new(new string[] {
match.Groups["filename"].Value.Trim(), BytesToReadableString(Convert.ToInt64(match.Groups["filesize"].Value)), "Ready"
});
fileList.Items.Add(ItemArray);
}
The regex you have shown in C# can be easily adapted to work in PHP as well.
You may use:
(?:\w+\(\h*|(?<!\A)\G\h*,\h*)"(?<filename>[^"]+)"\h*,\h*(?<filesize>\d+)
Note that I have refactored your regex a bit to make it more efficient.
Code:
<?php
$s = 'Files("textfile1.txt", 7268474425, "textfile2.txt", 661204928, "textfile3.txt", 121034)';
if (preg_match_all('/(?:\w+\(\h*|(?<!\A)\G\h*,\h*)"(?<filename>[^"]+)"\h*,\h*(?<filesize>\d+)/', $s, $m)) {
$out = array_combine ( $m['filename'], $m['filesize'] );
print_r($out);
}
?>
Output:
Array
(
[textfile1.txt] => 7268474425
[textfile2.txt] => 661204928
[textfile3.txt] => 121034
)
RegEx Details:
(?:
: Start a non-capture group
\w+\(\h*
: Match 1+ word characters followed by (
and 0 or more whitespaces|
: OR(?<!\A)\G
: Start matching from end of the previous match\h*,\h*
: Match comma surrounded with 0 or more whitespaces)
: End non-capture group"(?<filename>[^"]+)"
: Match double quoted string with named capture group filename
to match 1+ of any char that is not a "
\h*,\h*
: Match comma surrounded with 0 or more whitespaces(?<filesize>\d+)
: Named capture group filesize
to match 1+ digits