Search code examples
unicodepowershell

Replacing “smart quotes” in PowerShell


I'm finding myself somewhat stumped on a simple problem. I'm trying to remove fancy quoting from a bunch of text files. I've the following script, where I'm trying a number of different replacement methods, but without results.

Here's an example that downloads the data from GitHub and attempts to convert.

$srcUrl="https://raw.github.com/gist/1129778/d4d899088ce7da19c12d822a711ab24e457c023f/gistfile1.txt"
$wc = New-Object net.WebClient
$wc.DownloadFile($srcUrl,"foo.txt")
$fancySingleQuotes = "[" + [string]::Join("",[char[]](0x2019, 0x2018)) + "]"

$c = Get-Content "foo.txt"
$c | % { `
        $_ = $_.Replace("’","'")
        $_ = $_.Replace("`“","`"")
        $_.Replace("`”","`"")
    } `
    |  Set-Content "foo2.txt"

What's the trick for this to work?


Solution

  • Here's a version that works:

        $srcUrl="https://raw.github.com/gist/1129778/d4d899088ce7da19c12d822a711ab24e457c023f/gistfile1.txt"
        $wc = New-Object net.WebClient
        $wc.DownloadFile($srcUrl,"C:\Users\hartez\SO6968270\foo.txt")
    
        $fancySingleQuotes = "[\u2019\u2018]"
        $fancyDoubleQuotes = "[\u201C\u201D]"
    
        $c = Get-Content "foo.txt" -Encoding UTF8
    
        $c | % { `
            $_ = [regex]::Replace($_, $fancySingleQuotes, "'")
            [regex]::Replace($_, $fancyDoubleQuotes, '"')
        } `
        |  Set-Content "foo2.txt"
    

    The reason that manojlds' version wasn't working for you is that the encoding on the file you're getting from GitHub wasn't compatible with the Unicode characters in the regex. Reading it in as UTF-8 fixes the problem.