I am trying to convert just one file from UTF-8 to ASCII. I found the following script online, and it creates the Out-File
but it does not change the encoding to ASCII. Why is this not working?
Get-Content -Path "File/Path/to/file.txt" | Out-File -FilePath "File/Path/to/processed.txt" -Encoding ASCII
tl;dr
-Encoding ASCII
does work, though your editor's GUI may still report the resulting file as UTF-8-encoded, for the reasons explained below.
First, a general caveat:
?
, i.e. you'll potentially lose information.ASCII encoding is a subset of UTF-8 encoding (except that ASCII encoding never involves a BOM).
Modern editors default to BOM-less UTF-8; that is, if a file doesn't start with a BOM, they assume that it is UTF-8-encoded, and that's what their GUIs reflect - even if a given file happens to be composed of ASCII characters only.
To verify that your output file is indeed only composed of ASCII characters, use the following:
# This should return $false; '\P{IsBasicLatin}' matches any NON-ASCII character.
(Get-Content -Raw File/Path/to/processed.txt) -cmatch '\P{IsBasicLatin}'
For an explanation of this test, especially with respect to needing to use -cmatch
, the case-sensitive variant of the -match
operator, see this answer.
A complete example:
# Write a string that contains non-ASCII characters to a
# file with -Encoding Ascii
# The resulting fill will contain 1 line, with content 'caf?'
# That is, the "é" character was "lossily" transliterated to (ASCII) "?"
'café' | Out-File -Encoding Ascii temp.txt
# Examining the file for non-ASCII characters now indicates that
# there are none, i.e, $false is returned.
(Get-Content -Raw temp.txt) -cmatch '\P{IsBasicLatin}'