Search code examples
powershellencodingutf-8ansi

How to convert many files encoded (ANSI, UTF8 BOM etc.) to UTF8 without BOM with powershell script?


I'm trying to convert ANSI and UTF-8 BOM files to UTF-8 without BOM only. I have found a code that works to do that but in my files the word "président" from ANSI file, for exemple, is converted to "prxE9sident" or "pr?sident" (problem with accident é) in UTF8.

The script powershell code that I run in my parent folder:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
$source = "path"
$destination = "some_folder"

foreach ($i in Get-ChildItem -Recurse -Force) {
    if ($i.PSIsContainer) {
        continue
    }

    $path = $i.DirectoryName -replace $source, $destination
    $name = $i.Fullname -replace $source, $destination

    if ( !(Test-Path $path) ) {
        New-Item -Path $path -ItemType directory
    }

    $content = get-content $i.Fullname

    if ( $content -ne $null ) {

        [System.IO.File]::WriteAllLines($name, $content, $Utf8NoBomEncoding)
    } else {
        Write-Host "No content from: $i"   
    }
}

Any solution to keep accents well from ANSI and other files ?


Solution

  • There are actually two PowerShell Gotchas in the condition:

    if ( $content -ne $null ) { ...
    
    1. $Null should be on the left hand side of the equality comparison operator
    2. If your file is closed with a newline, the last item in the Get-Content results array is $Null

    This might cause the concerned condition to unexpectedly evaluate to $False and therefore your script doesn't even update the required files.

    Based on the additional comments, to save you files as ANSI, you should use the Windows-1252 encoding:

    [System.IO.File]::WriteAllLines($name, $content, ([System.Text.Encoding]::GetEncoding(1252)))