part of the script looks like this:
$template = Get-Content "./template/temaplate.htm" -raw
$html = $template.Replace('{{imie}}', $imie).Replace('{{nazwisko}}', $nazwisko).Replace('{{stanowisko}}', $stanowisko).Replace('{{mobile}}', $mobile).Replace('{{kapital}}', $kapital).Replace('{{telefon}}', $telefon)
Set-Content -Encoding UTF8 "output/podpis.htm" -Value $html
temaplate.htm has for example word "Sąd" or "Wrocław" but after running Set-Content all polish special characters are lost "SÄ…d", "WrocĹ‚aw" i dont really understand why. the template also have set
<meta charset="UTF-8">
Your symptom implies:
Your file is UTF-8-encoded but doesn't have a BOM.
You're using Windows PowerShell, where Get-Content
defaults to the system's active ANSI code page, and therefore misinterprets your file:[1]
Note that Get-Content
does not try to interpret the content of the file, and therefore the presence of <meta charset="UTF-8">
inside it is irrelevant.
All that matters is whether the file starts with a Unicode BOM (which unequivocally identifies the character encoding) or not (in which case an encoding must be assumed).
Using -Encoding utf8
only with Set-Content
is then too late, because the misinterpretation has already happened.
Note that you would not have this problem in PowerShell (Core) 7+, which consistently defaults to (BOM-less) UTF-8.
Therefore, use -Encoding utf8
also in your Get-Content
call:
$template = Get-Content -Encoding UTF8 "./template/temaplate.htm" -Raw
# ...
Set-Content -Encoding UTF8 "output/podpis.htm" -Value $html
Caveat:
Set-Content -Encoding UTF8
invariably creates a UTF-8 file with BOM. If that is undesired, use New-Item
as a workaround:# Creates a BOM-less UTF-8 file even in Windows PowerShell.
New-Item -Force "output/podpis.htm" -Value $html
(Again, in PowerShell (Core) 7+ you wouldn't have that problem: all cmdlets there create BOM-less UTF-8 files by default; -Encoding utf8bom
is needed to explicitly request a BOM.)
See this answer for additional information.
[1] Specifically, each byte in a multi-byte UTF-8 encoding sequence representing a single non-ASCII-range character is misinterpreted as its own character, namely a character from the ANSI character set. You can reproduce this as follows, assuming that Windows-1252 is the active ANSI code page: [Text.Encoding]::GetEncoding(1252).GetString([Text.Encoding]::UTF8.GetBytes('ą'))
- this yields Ä…
, i.e. two (different) characters, as in your question.