Search code examples
powershellpowershell-4.0

Remove unnecessary strings and special characters using powershell


I have this string:

$html = @'
<div><span style="display:inline !important;">As an Inventory and OperationsAgent</span></div><div><span style="display:inline !important;">I want to be able to raise internal EPL Nominations for Buildings that are going through a Shared Installation journey</span></div><div><span style="display:inline !important;">So that internal EPLNominations can be submitted when the TDF cost is too expensive for both P2P and Shared</span></div>
'@

... how do I remove all the HTML tags?


Solution

  • You can use the -replace regex operator to remove all the html tags:

    $html -replace '<[^>]+>'
    

    To substitute , for the <div> boundaries as well:

    $html -replace '</div>\s*<div>',', ' -replace '<[^>]+>'
    

    Which will output a string like:

    As an Inventory and OperationsAgent, I want to be able to raise internal EPL Nominations for Buildings that are going through a Shared Installation journey, So that internal EPLNominations can be submitted when the TDF cost is too expensive for both P2P and Shared