Search code examples
regexuipath

Regex Get Text Between 2 Words


I need to put one word and obtain the html tag with the word inclusive. Example:

  • Text input: Madhuparna

I need to obtain:

  • June 5, 2021 By Madhuparna

  • bla bla bla Madhuparna bla bla bla

Test text:

<p>The entire purpose speed up the process.</p><p>June 5, 2021 By Madhuparna</p>\r\n<p>The entire purpose of a terminal emulator is to imitate how the regular computer terminals perform and allowing the main computer to connect to and use a remote computer through a command-line or a graphical interface. The terminal emulators are known to carry out the functions using the software.</p>\r\n<a>It allows file transfer between the main and the remote computer using SSH (Secure Shell) and also enables the host system to execute applications on the remote system. While it features a graphical user interface, programmers rather prefer the text-based interface to gain more control over all functions and speed up the process.</a><p>bla bla bla Madhuparna bla bla bla</p>

What I do for now but not work:

<(\S*?)[^>]*>.*?Madhuparna.*?<\/\1>|<.*?\/>

Solution

  • please try the following:

    *edit - getting slightly messier now (and quite "hacky")...

    /<([pali]{1,2})>[^<>]*Madhuparna[^<>]*<\/\1>/g
    

    Probably not entirely optimised but does the job as per your sample.

    This assumes that the only tags you are encountering (as per your sample) are <p> and <a> but please update the first capturing group ([pa]) of the regex if needed.

    Proof here: https://regex101.com/r/16jjLn/1 - (updated)

    The explanation panel on the link above will explain what the regex is doing.