Search code examples
powershellreplaceunicode

PowerShell - find-replace multiple strings with Unicode characters in a file


i am trying to replace multiple strings in a file in powershell, but with a unicode equivalents

e.g.

$PSDefaultParameterValues = @{'Out-File:Encoding' = 'utf8'}
$fileContent = Get-Content "D:\PS\comp.txt"
$updatedContent = $fileContent -replace 'EU','EU-01' -replace 'AP','AP-01'
$updatedContent | Set-Content "D:\PS\comp.txt"

In the example the replace operator replaces the "EU" and "AP" with the "EU-01" and "AP-01".

I need to replace some country stings names i.e. "UK", "Ireland'', "France" etc, with their unicode equivalents.

i.e.

$updatedContent = $fileContent -replace 'Ireland','IE' -replace 'UK','GB'

Where:

"IE" should be Unicode U+1F1EE U+1F1EA (i.e 🇮🇪)

"GB" should be Unicode U+1F1EC U+1F1E7 (i.e 🇬🇧)

I am unsure how to get this part in a powershell. Thank you.


Solution

  • Like this? Both the script itself and the output file have to be utf8 with bom. Out-File's defaults don't apply here. Might want to backup the file first. In Powershell 7, everything is utf8 no bom by default.

    $fileContent = Get-Content D:\PS\comp.txt
    $updatedContent = $fileContent -replace 'EU','EU-01' -replace 'AP',
      'AP-01' -replace 'Ireland','🇮🇪' -replace 'UK','🇬🇧'
    $updatedContent | Set-Content D:\PS\comp.txt -encoding utf8
    

    Or a one-liner, the parentheses serve to completely run get-content first.

    (Get-Content D:\PS\comp.txt) -replace 'EU','EU-01' -replace 'AP',
      'AP-01' -replace 'Ireland','🇮🇪' -replace 'UK','🇬🇧' | 
      Set-Content D:\PS\comp.txt -encoding utf8
    

    utf8 bom:

    get-content -encoding byte -totalcount 3 file | % tostring x
    
    ef
    bb
    bf