Search code examples
regexregex-group

I hard regex to optional space character when I need get rid space character in output/replace


I tried many changes times but not work, 99% success match.

I want optional space properly. and replace group 1,2,3,4,5 without being space like (.sys) but not space (.sys )

regex search:

(?<size>[+-]?(?:(?:[0-9]{1,3}(?:,[0-9]{3})+|[0-9]+)(?:\.[0-9]+)?|\.[0-9]+))[\t\x20]*(?<size_type>(?i)gb|mb|m|g)[\t\x20]*(?<file>.+(?=\.)|.+)(?<type>(?:\..*)?)\s*\|\s*(?<path>(?i:C|D):.*\\)

regex replace:

(\1)(\2)(\3)(\4)(\5)

Text:

3.9 GB pagefile.sys | C:\
3.9 GB pagefile.sys |C:\
3.9 GB pagefile.sys| C:\
3.9 GB pagefile.sys|C:\

3.9 GB pagefile.sys | C:\
3.9 GBpagefile.sys | C:\
3.9GB pagefile.sys | C:\
3.9GBpagefile.sys | C:\

expected behavior I want:

(3.9)(GB)(pagefile)(.sys)(C:\)
(3.9)(GB)(pagefile)(.sys)(C:\)
(3.9)(GB)(pagefile)(.sys)(C:\)
(3.9)(GB)(pagefile)(.sys)(C:\)

(3.9)(GB)(pagefile)(.sys)(C:\)
(3.9)(GB)(pagefile)(.sys)(C:\)
(3.9)(GB)(pagefile)(.sys)(C:\)
(3.9)(GB)(pagefile)(.sys)(C:\)

actual behavior:

(3.9)(GB)(pagefile)(.sys )(C:\)
(3.9)(GB)(pagefile)(.sys )(C:\)
(3.9)(GB)(pagefile)(.sys)(C:\)
(3.9)(GB)(pagefile)(.sys)(C:\)

(3.9)(GB)(pagefile)(.sys )(C:\)
(3.9)(GB)(pagefile)(.sys )(C:\)
(3.9)(GB)(pagefile)(.sys )(C:\)
(3.9)(GB)(pagefile)(.sys )(C:\)

See regex101.com here link

anyone help?


Solution

  • The reason you see an extra space in the replacement is because the .* in in matching the type (?<type>(?:\..*)?) can also match a space.

    You could restrict it using \S* matching optional non whitespace chars if there has to be at least a single dot.

    The alternation for the size_type can also be written using character classes (?<size_type>(?i)[gm]b|[mg]) and the same for the path (?<path>(?i:[CD]):.*\\)

    The whole pattern could look like:

    (?<size>[+-]?(?:(?:[0-9]{1,3}(?:,[0-9]{3})+|[0-9]+)(?:\.[0-9]+)?|\.[0-9]+))[\t\x20]*(?<size_type>(?i)[gm]b|[mg])[\t\x20]*(?<file>.+(?=\.)|.+)(?<type>(?:\.\S*)?)\s*\|\s*(?<path>(?i:[CD]):.*\\)
    

    Regex demo

    If there is always a pipe char and a single char C or D followed by :\ another option could be:

    (?<size>[+-]?(?:(?:[0-9]{1,3}(?:,[0-9]{3})+|[0-9]+)(?:\.[0-9]+)?|\.[0-9]+))[\t\x20]*(?<size_type>(?i)gb|mb|m|g)[\t\x20]*(?<file>[^\s|]+)(?<type>\.[^|\s]+)[\t\x20]*\|[\t\x20]*(?<path>(?i:[CD]):\\)
    

    Regex demo