Search code examples
regexpowershellpowershell-2.0powershell-3.0powershell-4.0

Get data between two tags


I need to extract text between two tags <mail> and </mail>

This is the text

<?xml version='1.0' encoding='utf-16'?>
<li xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
  <g id='{E5EABB1F-40BC-45BB-8D87-3B6C239B521B}' displayName='Actions' onclick='javascript:return scForm.postEvent(this,event,'forms:addaction')'>
    <li id='{D4502A11-9417-4479-9F2A-485F45D2E2D0}' unicid='B048F2B1C5964A1CB64AEEE249C00371'>
      <parameters><host>smtp.sendgrid.net</host><port>587</port><login>[email protected]</login><password>n8vhr^^mcQE4</password><from>[email protected]</from><isbodyhtml>true</isbodyhtml><to>[{CC59436D-F6A6-4B84-A490-7A6F6ACDF8C9}]</to><cc></cc><bcc></bcc><localfrom>[email protected]</localfrom><subject>[{5A297C9C-4979-49C5-BDE6-F6F5351DF7AA}], thank you for registering for site Community Updates</subject><mail>
        <table width='100%' border='0' cellspacing='0' cellpadding='0' class='em_full_wrap' style='background-color: #efefef;'>
            <tbody>
                <tr>
                    <td align='center' valign='top'>
                    <table align='center' width='700' border='0' cellspacing='0' cellpadding='0' class='em_main_table' style='width: 700px; table-layout: fixed; background-color: #efefef;'>
                        <tbody>
                            <tr>
                                <td align='center' valign='middle' class='em_space' style='font-family:Arial, sans-serif; font-size:12px; line-height:15px; color:#000000; padding:23px 10px;'><span class='em_defaultlink'><a href='%%view_email_url%%' target='_blank' style='text-decoration:none; color:#000000;'>View this email online.</a></span></td>
                            </tr>
                        </tbody>
                    </table>
                    </td>
                </tr>
                <tr>
                    <td align='center' valign='top'>
                    <table width='100%' border='0' cellspacing='0' cellpadding='0' align='center'>
                        <tbody>
                            <tr>
                                <td align='center' valign='top'>
                                <table cellpadding='0' cellspacing='0' width='100%' role='presentation' style='min-width: 100%;' class='stylingblock-content-wrapper'>
                                    <tbody>
                                        <tr>
                                            <td class='stylingblock-content-wrapper camarker-inner'><!--BLOCK 01- LOGO-->
                                            <table align='center' border='0' cellpadding='0' cellspacing='0' width='100%'>
                                                <tbody>
                                                    <tr>
                                                        <td align='center' valign='top'>
                                                        <table align='center' border='0' cellpadding='0' cellspacing='0' class='em_main_table' style='width: 700px; background-color: #3e2246;' width='700'>
                                                            <tbody>
                                                                <tr>
                                                                    <td align='center' class='em_full_img' style='padding-top: 0px;' valign='top'>
                                                                    <a conversion='false' data-linkto='https://' href='https://site.test.com/' style='text-decoration:none;' target='_blank'><img alt='site by test' data-assetid='165573' height='182' src='https://image.e.residential.test.com/lib/fe3815707564067c721d73/m/13/new-logo_image.jpg' style='display: block; font-family: Arial, sans-serif; font-size: 24px; line-height: 30px; color: #FFFFFF; max-width: 700px; padding: 0px; text-align: center; height: 182px; width: 700px;' width='700'></a></td>
                                                                </tr>
                                                            </tbody>
                                                        </table>
                                                        </td>
                                                    </tr>
                                                </tbody>
                                            </table>
                                            <!--//BLOCK 01- LOGO--></td>
                                        </tr>
                                    </tbody>
                                </table>
                                <table cellpadding='0' cellspacing='0' width='100%' role='presentation' style='min-width: 100%;' class='stylingblock-content-wrapper'>
                                    <tbody>
                                        <tr>
                                            <td class='stylingblock-content-wrapper camarker-inner'><!--BLOCK 03 - HERO-->
                                            <table align='center' border='0' cellpadding='0' cellspacing='0' width='100%'>
                                                <tbody>
                                                    <tr>
                                                        <td align='center' valign='top'>
                                                        <table align='center' border='0' cellpadding='0' cellspacing='0' class='em_main_table' style='width: 700px; table-layout: fixed; background-color: #ffffff;' width='700'>
                                                            <tbody>
                                                                <tr>
                                                                    <td align='center' class='em_full_img' valign='top'>
                                                                    <img alt='' class='em_g_img' height='400' src='https://image.e.residential.test.com/lib/fe3815707564067c721d73/m/12/g_banner_image.jpg' style='display: block; font-family: Arial, sans-serif; font-size: 18px; line-height: 30px; color: #424242; max-width: 700px; border-width: 0px; border-style: solid;' width='700'></td>
                                                                </tr>
                                                            </tbody>
                                                        </table>
                                                        </td>
                                                    </tr>
                                                </tbody>
                                            </table>
                                            <!--//BLOCK 03 - HERO--></td>
                                        </tr>
                                    </tbody>
                                </table>
                                <table cellpadding='0' cellspacing='0' width='100%' role='presentation' style='min-width: 100%;' class='stylingblock-content-wrapper'>
                                    <tbody>
                                        <tr>
                                            <td class='stylingblock-content-wrapper camarker-inner'><!--BLOCK 04 - SUBHEAD-->
                                            <table align='center' border='0' cellpadding='0' cellspacing='0' width='100%'>
                                                <tbody>
                                                    <tr>
                                                        <td align='center' valign='top'>
                                                        <table align='center' border='0' cellpadding='0' cellspacing='0' class='em_main_table' style='width: 700px; table-layout: fixed; background-color: #3e2246;' width='700'>
                                                            <tbody>
                                                                <tr>
                                                                    <td align='center' class='em_aside15 em_ptop' style='padding:30px 40px 0px 50px;' valign='top'>
                                                                    <table align='center' border='0' cellpadding='0' cellspacing='0' width='100%'>
                                                                        <tbody>
                                                                            <tr>
                                                                                <td class='em_font40' style='font-family: Georgia, 'Times New Roman', serif; font-size: 31px; line-height: 40px; color: #ff7f00; text-align: center;' valign='top'>
                                                                                Thank you for registering for site Community Updates
                                                                                <table align='center' border='0' cellpadding='0' cellspacing='0' width='100%'>
                                                                                    <tbody>
                                                                                        <tr>
                                                                                            <td style='font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; color: #ffffff; text-align: center;' valign='top'>
                                                                                            <br>
                                                                                            <strong>Welcome to site. You have now been added to the site community database.</strong></td>
                                                                                        </tr>
                                                                                    </tbody>
                                                                                </table>
                                                                                </td>
                                                                            </tr>
                                                                        </tbody>
                                                                    </table>
                                                                    </td>
                                                                </tr>
                                                            </tbody>
                                                        </table>
                                                        </td>
                                                    </tr>
                                                </tbody>
                                            </table>
                                            <!--//BLOCK 04 - SUBHEAD--></td>
                                        </tr>
                                    </tbody>
                                </table>
                                <table cellpadding='0' cellspacing='0' width='100%' role='presentation' style='min-width: 100%;' class='stylingblock-content-wrapper'>
                                    <tbody>
                                        <tr>
                                            <td class='stylingblock-content-wrapper camarker-inner'><!--BLOCK 27 - COMMUNITY-->
                                            <table align='center' border='0' cellpadding='0' cellspacing='0' width='100%'>
                                                <tbody>
                                                    <tr>
                                                        <td align='center' valign='top'>
                                                        <table align='center' border='0' cellpadding='0' cellspacing='0' class='em_main_table' style='width:700px; table-layout:fixed;' width='700'>
                                                            <tbody>
                                                                <tr>
                                                                    <td align='center' style='background-color:#3e2246;' valign='top'>
                                                                    <table align='center' border='0' cellpadding='0' cellspacing='0' class='em_wrapper' style='width:700px;' width='700'>
                                                                        <tbody>
                                                                            <tr>
                                                                                <td class='em_h20' style='height:50px; font-size:1px;line-height:1px;'>
                                                                                <img alt='' height='1' src='http://image.e.residential.test.com/lib/fe3815707564067c721d73/m/1/a06c1dac-d3ac-4c97-a8fe-f5145ca2811c.gif' style='display: block; border-width: 0px; border-style: solid;' width='1'></td>
                                                                            </tr>
                                                                            <tr>
                                                                                <td align='center' class='em_aside15' valign='top'>
                                                                                <table border='0' cellpadding='0' cellspacing='0' class='em_wrapper' dir='rtl' style='width:600px;' width='600'>
                                                                                    <tbody>
                                                                                        <tr>
                                                                                            <td valign='top'>
                                                                                            <table align='right' border='0' cellpadding='0' cellspacing='0' class='em_wrapper' dir='ltr' style='width:300px;' width='300'>
                                                                                                <tbody>
                                                                                                    <tr>
                                                                                                        <td align='center' class='em_full_img' valign='top'>
                                                                                                        <img alt='site by test' class='em_g_img' height='176' src='https://image.e.residential.test.com/lib/fe3815707564067c721d73/m/12/image_300x176.jpg' style='display: block; max-width: 300px; font-family: Arial, sans-serif; font-size: 20px; font-weight: bold; color: #ffffff; border-width: 0px; border-style: solid;' width='300'></td>
                                                                                                    </tr>
                                                                                                </tbody>
                                                                                            </table>
                                                                                            <!--[if gte mso 9]></td><td valign='top'><![endif]-->
                                                                                            <table align='left' border='0' cellpadding='0' cellspacing='0' class='em_wrapper' dir='ltr' style='width:299px;' width='299'>
                                                                                                <tbody>
                                                                                                    <tr>
                                                                                                        <td valign='top'>
                                                                                                        <table align='center' border='0' cellpadding='0' cellspacing='0' class='em_wrapper' style='width:299px;' width='299'>
                                                                                                            <tbody>
                                                                                                                <tr>
                                                                                                                    <td align='center' valign='top'>
                                                                                                                    <table align='left' border='0' cellpadding='0' cellspacing='0' class='em_wrapper' style='width:299px;' width='299'>
                                                                                                                        <tbody>
                                                                                                                            <tr>
                                                                                                                                <td align='center' class='em_ptop' valign='top'>
                                                                                                                                <table align='left' border='0' cellpadding='0' cellspacing='0' class='em_wrapper' style='width:270px;' width='270'>
                                                                                                                                    <tbody>
                                                                                                                                        <tr>
                                                                                                                                            <td align='left' class='em_center' style='color:#ff7f00; font-size:26px; line-height:32px; font-family: Georgia,'Times New Roman', serif;' valign='top'>
                                                                                                                                            <span class='em_defaultlink'>KEEP UP TO DATE</span></td>
                                                                                                                                        </tr>
                                                                                                                                        <tr>
                                                                                                                                            <td class='em_h20' style='line-height: 1px;font-size:1px; height:18px;'>
                                                                                                                                            <img alt='' height='1' src='http://image.e.residential.test.com/lib/fe3815707564067c721d73/m/1/a06c1dac-d3ac-4c97-a8fe-f5145ca2811c.gif' style='display: block; border-width: 0px; border-style: solid;' width='1'></td>
                                                                                                                                        </tr>
                                                                                                                                        <tr>
                                                                                                                                            <td align='left' class='em_center' style='font-size: 14px; line-height: 18px; font-family: Arial, sans-serif; color:#ffffff;' valign='top'>
                                                                                                                                            <span class='em_defaultlink'>Find out more about site’s fantastic community events and news ongoing stories. </span></td>
                                                                                                                                        </tr>
                                                                                                                                        <tr>
                                                                                                                                            <td class='em_h20' style='line-height: 1px; font-size:1px; height:32px;'>
                                                                                                                                            <img alt='' height='1' src='http://image.e.residential.test.com/lib/fe3815707564067c721d73/m/1/a06c1dac-d3ac-4c97-a8fe-f5145ca2811c.gif' style='display: block; border-width: 0px; border-style: solid;' width='1'></td>
                                                                                                                                        </tr>
                                                                                                                                        <tr>
                                                                                                                                            <td align='left' valign='top'>
                                                                                                                                            <table align='left' border='0' cellpadding='0' cellspacing='0' class='em_wrapper'>
                                                                                                                                                <tbody>
                                                                                                                                                    <tr>
                                                                                                                                                        <td align='center' valign='top'>
                                                                                                                                                                                                                                                                                                                    </td>
                                                                                                                                                    </tr>
                                                                                                                                                </tbody>
                                                                                                                                            </table>
                                                                                                                                            </td>
                                                                                                                                        </tr>
                                                                                                                                    </tbody>
                                                                                                                                </table>
                                                                                                                                </td>
                                                                                                                                <td class='em_hide' style='width:29px;'>
                                                                                                                                <img alt='' height='1' src='http://image.e.residential.test.com/lib/fe3815707564067c721d73/m/1/a06c1dac-d3ac-4c97-a8fe-f5145ca2811c.gif' style='display: block; border-width: 0px; border-style: solid;' width='1'></td>
                                                                                                                            </tr>
                                                                                                                        </tbody>
                                                                                                                    </table>
                                                                                                                    </td>
                                                                                                                </tr>
                                                                                                            </tbody>
                                                                                                        </table>
                                                                                                        </td>
                                                                                                    </tr>
                                                                                                </tbody>
                                                                                            </table>
                                                                                            </td>
                                                                                        </tr>
                                                                                    </tbody>
                                                                                </table>
                                                                                </td>
                                                                            </tr>
                                                                            <tr>
                                                                                <td class='em_h20' style='height:50px; font-size:1px;line-height:1px;'>
                                                                                <img alt='' height='1' src='http://image.e.residential.test.com/lib/fe3815707564067c721d73/m/1/a06c1dac-d3ac-4c97-a8fe-f5145ca2811c.gif' style='display: block; border-width: 0px; border-style: solid;' width='1'></td>
                                                                            </tr>
                                                                        </tbody>
                                                                    </table>
                                                                    </td>
                                                                </tr>
                                                            </tbody>
                                                        </table>
                                                        </td>
                                                    </tr>
                                                </tbody>
                                            </table>
                                            <!--//BLOCK 27 - COMMUNITY--></td>
                                        </tr>
                                    </tbody>
                                </table>
                                </td>
                            </tr>
                        </tbody>
                    </table>
                    </td>
                </tr>
            </tbody>
        </table>
        <custom name='opencounter' type='tracking'>
        </custom>
    
</mail><localized></localized></parameters>
    </li>
    <li id='{31602921-2C7A-42F0-ABFE-58740B47DBF1}' unicid='FD74B915782841298684DA49A56FD568'>
      <parameters />
    </li>
  </g>
</li>

I wrote the following function

Function Get-StringBetweenStartEnd {
    Param($Text,$Start,$End)
    $Regex = [Regex]::new("(?<="+$Start+")(.*)(?="+$End+")")           
    $Match = $Regex.Match($String)           
    if($Match.Success) { Return $Match.Value}else{Return ""}
}

$Result = Get-StringBetweenStartEnd -Text $strInput -Start "<mail>" -End "</mail>"
write-host $Result.Trim()

The value is always null.

Any suggestion would be appreciated. Thanks in advance


Solution

  • By default, the . pattern does not match across multiple lines.

    Enable single-line mode to change its behaviour:

    Changes the meaning of the dot (.) so it matches every character (instead of every character except \n).

    There are two ways to do it:

    1. By specifying inline option s:

      [Regex]::Match( $text, '(?s)<mail>(.*)</mail>' ).Groups.Value
      
    2. By passing RegexOptions.Singleline to the Regex.Match method:

      [Regex]::Match( $text, '<mail>(.*)</mail>', 
      [Text.RegularExpressions.RegexOptions]::Singleline ).Groups.Value
      

    I took the liberty to simplify your RegEx by changing the lookbehind/lookahead assertions into simple patterns. Using a group to extract the value is simpler in this case.

    Note that the static [Regex]::Match() method can be faster as it keeps a cache of compiled regular expressions and thus doesn't have to interpret the RegEx again when the same RegEx is used multiple times (e. g. in a loop).