Search code examples
javascriptregexregex-lookaroundsregex-groupregex-greedy

Find (replace) the last space in any HTML headings within block of HTML


I'm trying to come up with some regex which I can use to replace the last space character with a non-breaking space (control widows) within headings only inside a block of HTML.

So far I have this:

const regex = /(<h.>.+?)\s+((\S|<[^>]+>)*)\n|$/gi
const replaced = text.replace(regex, '$1&nbsp;$2')

In regex101 it looks like it works correctly but when running in JavaScript it adds an extra &nbsp to the end of the string.

A sample block of HTML might look like this:

<h2>This is a test heading</h2>
<p>Here is some text</p>
<div>
  <h3>Here is a another heading</h3>
  <p>Some more paragraph text which shouldn't match</p>
</div>

Which should be replaced with:

<h2>This is a test&nbsp;heading</h2>
<p>Here is some text</p>
<div>
  <h3>Here is a another&nbsp;heading</h3>
  <p>Some more paragraph text which shouldn't match</p>
</div>

A link to regex101 showing the working pattern.

Below is a snippet showing the non-working behaviour in JavaScript:

let text = "<h2>This is a test heading</h2>"
const regex = /(<h.>.+?)\s+((\S|<h.>)*)\n|$/gi
let replaced = text.replace(regex, '$1&nbsp;$2')
console.log(replaced);

text = `<h2>This is a test heading</h2>
<p>Here is some text</p>
<div>
  <h3>Here is a another heading</h3>
  <p>Some more paragraph text which shouldn't match</p>
  <p>Why is there a non breaking space at the very end?</p>
</div>`
replaced = text.replace(regex, '$1&nbsp;$2')
console.log(replaced);


Solution

  • Here, we would be starting with a simple expression to capture the undesired space as well as other possible spaces that might come right before the last word using this capturing group (\s+):

    <(h[1-6])>(.+)(\s+)([^\s]+)<\/\1>
    

    If we wish to add more constraints to our expression, we can certainly do so.

    Demo

    Test

    const regex = /<(h[1-6])>(.+)(\s+)([^\s]+)<\/\1>/gim;
    const str = `<h2>This is a test heading</h2>
    <p>Here is some text</p>
    <div>
      <h3>Here is a another heading</h3>
      <p>Some more paragraph text which shouldn't match</p>
    </div>
    <h2>This is a test   heading</h2>
    <p>Here is some text</p>
    <div>
      <h3>Here is a another    heading</h3>
      <p>Some more paragraph text which shouldn't match</p>
    </div>`;
    const subst = `<$1>$2&nbsp;$4<\/$1>`;
    
    // The substituted value will be contained in the result variable
    const result = str.replace(regex, subst);
    
    console.log(result);

    RegEx

    If this expression wasn't desired and you wish to modify it, please visit this link at regex101.com.

    RegEx Circuit

    jex.im visualizes regular expressions:

    enter image description here