Search code examples
powershellreplacecharacter

How to avoid newline character from getting replaced with "?" in PowerShell


I'm working on a config installer for a game. I want to make a menu for the user to choose from different colors for certain settings. To change those colors I use a PowerShell command in a batch file to find and replace the relevant text in a specific file. There is no problem with that alone.

In the replacement process, PowerShell also replaces the newline character found in the config file with a "?". That is not intended and I want to avoid that.

The character that gets replaced with a "?" is the following:

I want to exclude that character from getting replaced in the process.

My code looks like that:

powershell -command "& {($p=gc "path.txt");(gc $p\GameConfig\SpecificFile.txt).replace('<col:Default>','<col:Green>') | sc $p\GameConfig\SpecificFile.txt}"

I have already tried to exclude the character like so:

powershell -command "& {($p=gc "path.txt");(gc $p\GameConfig\SpecificFile.txt).replace[↵]::escape('<col:Default>','<col:Green>') | sc $p\GameConfig\SpecificFile.txt}"

That didn't work.

I also tried to revert the replacement process of the newline character like so:

powershell -command "& {($p=gc "path.txt");(gc $p\GameConfig\SpecificFile.txt).replace('<col:Default>','<col:Green>') | sc $p\GameConfig\SpecificFile.txt}"
powershell -command "& {($p=gc "path.txt");(gc $p\GameConfig\SpecificFile.txt).replace('>?<','>↵<') | sc $p\GameConfig\SpecificFile.txt}" 

That didn't work either.

I really need some help. Thanks in advance!

Cheers


Solution

  • tl;dr

    Use the -Encoding parameter of Set-Content (whose built in alias is sc in Windows PowerShell) to specify a Unicode character encoding, to ensure that Unicode characters such as (DOWNWARDS ARROW WITH CORNER LEFTWARDS , U+21B5) are preserved; to use UTF-8 encoding, for instance, add -Encoding utf8:

    powershell -command "$p=gc path.txt; (gc -Encoding utf8 $p\GameConfig\SpecificFile.txt).Replace('<col:Default>','<col:Green>') | sc -Encoding utf8 $p\GameConfig\SpecificFile.txt"
    

    A streamlined reformulation that speeds up processing by reading the file as a whole rather than line by line, using Get-Content's -Raw switch as well as Set-Content's -NoNewLine switch:

    powershell -command "$p=(gc path.txt)+'\GameConfig\SpecificFile.txt'; (gc -Raw -Encoding utf8 $p).Replace('<col:Default>','<col:Green>') | sc -Encoding utf8 -NoNewLine $p"
    

    To instead use UTF-16LE ("Unicode") encoding, use -Encoding Unicode (sic).

    Note:

    • In Window PowerShell, the legacy, ships-with-Windows Windows-only PowerShell edition you're using, this invariably creates a UTF-8 file with a BOM.

      • If that is undesired, you need a workaround - see this answer.

      • Note that if / once your input files are BOM-less UTF-8 files, you also need to use -Encoding utf8 for reading them properly with Get-Content (whose built-in alias is gc), as used in the command above; without that, the file would be misinterpreted as ANSI-encoded in Windows PowerShell (see next point).

    • By default, Windows PowerShell's Set-Content cmdlet uses ANSI encoding, i.e. the fixed-width 8-bit character encoding associated with your system's legacy system locale (aka language for non-Unicode programs), such as Windows-1252 on US-English systems.

      • Trying to save a Unicode character such as that cannot be represented in such an encoding results in an (ASCII-range) ? character getting saved instead, which is what you saw.

      • Note that the PowerShell (Core) 7+ edition now fortunately consistently defaults to (BOM-less) UTF8.

    • Generally, note that PowerShell's pipelines are not raw byte conduits: text file contents as well as output from external programs are invariably decoded into .NET strings before further processing, so that a Get-Content ... | Set-Content ... pipeline never preserves the original character encoding and instead uses Set-Content's default encoding on writing (unless the -Encoding parameter is used); see this answer for background information.