Search code examples
regexvisual-studioxml-documentation

Regex substitution for variable number of capturing groups | Mass XML doc editing in Visual Studio


This is a many-in-one question.

I want to inject newlines into the xml documentation of my C# project.
Going from

/// <summary>
/// Event for lazy async var reading. 
/// Contains all vars that partecipated to a certain poll,  
/// indepentently of the fact it succeded or the value changed </summary>

to

/// <summary>
/// <br>Event for lazy async var reading. </br>
/// <br>Contains all vars that partecipated to a certain poll,</br>  
/// <br>indepentently of the fact it succeded or the value changed </br></summary>

My first approach was running a Regex find & replace, and i managed to correctly select the comments accounting for different indentations and such.

When it comes to capturing groups tho, i'm not finding a way to reference the lines making up the body.
Both the line's capturing group and inner capturing groups seem to go missing.

My regex:

(?x:
    (?(DEFINE)
        (?'line'\/\/\/\s*([^\n]+)\n*)
    )
    (?'head'<summary>(?:\s*\n | \s*[^\n]+\n*))  #select summary tag and skip the first line.
    (?P>line)+                                  #select each line, capturing both the line and a subset of its content
    (?'foot'(?:\/\/\/\s*)*<\/summary>)          #select summary closure and the empty comment-lines preceding it
)

this expression produces

{0}:  full match  
{1 (named head)}:   \<summary\>...  
{4}:  last matched (?P>line)  
{5 (named foot)}:   ...\</summary\>  

where did capturing groups {2} and {3} go?
where did (?'line'..)'s inner capturing group go?
Is there an Visual studio extension that's up to the task instead?

Even if the regex worked, i don't know how to reference all the in-between groups, as their number is variable.
My plan is referencing all numbered group explicitly up to some number, and have the others named to avoid selection. Is there a better alternative?

p.s. if there is a way to post regex "code" formatted in a color-coded pretty way, i didn't find it. Any pointers are welcome.

p.p.s. EDIT: related questions i was apparently unable to find stated that "it's not feasable by regex alone, just through functions in actual languages"


Solution

  • This can clearly be done by regex alone.
    Visual Studio does not have a usable regex engine.
    This can however be done in C# using Capture Collections.

    You can bite the bullet and use the Pcre engine too.
    Here is an idea for you.

    (?m)(?:^///\h*<summary>\h*(?:\s*^///\h*)*\K(?'line'(?:(?!</?summary\s*>)\S)+(?:\h*(?:(?!</?summary\s*>)\S)+)*)|\G(?:\s*^///\h*)*\K(?&line))
    

    Substitute <br>$0</br>

    https://regex101.com/r/bdbVMw/1

    Rx layout:

    (?m)
    (?:
       ^ /// \h* <summary> \h* 
       (?: \s* ^ /// \h* )*
       \K        
       (?'line'                      # (1 start)
          (?: (?! </?summary \s* > ) \S )+
          (?: \h* (?: (?! </?summary \s* > ) \S )+ )*
       )                             # (1 end)
       
     | \G 
       (?: \s* ^ /// \h* )*
       \K (?&line) 
    )