Search code examples
htmlregexnotepad++dreamweaver

Regular expression to find, copy and insert at end


I have a large TOC in HTML as an unordered list. Each anchor tag in the list has its own individual id attribute. Possibly using either Dreamweaver or notepad++ I want to be able to find the id attribute, copy it and place it after the hash (.html#) set up in the href attribute so the long TOC will scroll to the position in which the page it has navigated to.

So for the example below:

<li><a id="a4.9" href="trigger_marker.html#">Trigger marker</a> </li>
<li><a id="a4.10" href="timedelayarrow.html#">Time-delay arrow</a> </li>
<li><a id="a4.11" href="spectrumview.html#">Spectrum view</a> </li>

I would like the outcome to be:

<li><a id="a4.9" href="trigger_marker.html#a4.9">Trigger marker</a> </li>
<li><a id="a4.10" href="timedelayarrow.html#a4.10">Time-delay arrow</a> </li>
<li><a id="a4.11" href="spectrumview.html#a4.11">Spectrum view</a> </li>

Any help is as always very much appreciated, apologies at my lack of attempt on this but regular expressions are the only way I suspect I may be able to achieve this and I have little to no experience knowledge of regexes.


Solution

  • Try this regex:

    (?<=id=")([^"\n]+)"[^#\n]+#\K

    Click for Demo

    Explanation:

    • (?<=id=") - Positive lookbehind searching for the position which is preceded by id="
    • ([^"\n]+) - Matching 1+ occurrences of any character which is neither " nor a newline character and capturing it in group
    • " - matches " literally
    • [^#\n]+ - matching 1+ occurrences of any character which is neither a # nor a newline character
    • # - matches # literally
    • \K - Forget everything matched so far. Here, we get the position where we have to insert what was captured in group 1.

    After replacement,

    enter image description here