Search code examples
regexreplacehtml-parsingsublimetext

Select each instance of a pattern, inside a pattern, with regex?


I am trying to remove spaces from inside HTML IDs on headings, and replace them with a - character. So far, I have been doing a multi step process instead. I would like to condense this to one step using regex. I have been trying to make a regex pattern that will highlight instances of a character, inside a variable pattern, but I have not had much success.

The regex should replace 2 spaces here:

<h2 id="three word sentence">

The regex should replace 3 spaces here:

<h2 id="four words in sentence">

This is what I have so far, which finds the entire ID on each item. Then I turn on "find in selection" and replace spaces with -.

(?<=<h[234] id=").*(?=")

How can I find just the spaces in one step?


Solution

  • You can use

    (?:\G(?!\A)|<h\d+\s+id=")[^"\s]*\K\s+(?=[^"]*")
    

    See the regex demo. Details:

    • (?:\G(?!\A)|<h\d+\s+id=") - either the end of the previous successful match or <h, one or more digits, one or more whitespaces and id=" string
    • [^"\s]* - zero or more chars other than " and whitespace
    • \K - match reset operator that discards the text matched so far from the overall match memory buffer
    • \s+ - one or more whitespaces
    • (?=[^"]*") - a positive lookahead that requires zero or more chars other than " and then a " char immediately to the right of the current position.