Search code examples
jsonpowershellcharacter-encodinginvoke-webrequest

Integration for wix.com store api | diacritics problem


annoying problem, maybe you can help me. Trying to pass data into "new product" on wix.com via API), but some of the products contains local signs. After invoke-webrequest is successfully processed (part of code below) on the server end I'm obtaining converted data without diacritics

Trying to enforce charset utf-8, and in $body I'm obtaining proper result,

$headers  = @{
     "Content-Type"  = "application/json"
     "Authorization" = "$auth"
     "charset"       = "UTF-8"
            }
[...]
$obj = @{
    "product" = @{
                "name"        = [string]($product.name)
               }
[...]

$body = $obj | convertto-json -depth 3;
Invoke-WebRequest -Headers $headers -Uri $url  -method POST -body $body 

but on server all diacritics are gone. I believe there should be some possibility to change it, but seems that required some changes on server end. Please correct me if I'm wrong. Any help appreciated. Thanks a lot for all answers.


Solution

  • tl;dr

    • Omit the "charset" entry from your $headers hashtable:

      • Media type application/json uses UTF-8 by default.
      • Aside from that, the charset attribute is meant to be a part of the Content-Type entry, not a separate one (e.g., 'Content-Type' = 'text/plain; charset=utf-8')
    • To ensure that UTF-8 is used in the request body in Windows PowerShell and PowerShell (Core) versions up to v7.3.x, explicitly obtain the UTF-8 encoding of the JSON string to post, via System.Text.UTF8Encoding, and pass the result - a [byte[]] array - to the -Body parameter of Invoke-WebRequest instead:

    Invoke-WebRequest -Headers $headers -Uri $url -Method POST -Body (
      [System.Text.Utf8Encoding]::new().GetBytes($body)
    )
    

    Background information:
    • In Windows PowerShell and in PowerShell (Core) up to v7.3.x:

      • These versions use ISO-8859-1 as the default character encoding (v7.0 - v7.3.x selectively default to UTF-8, but only in responses, and only for media type application/json); ISO-8859-1 is largely identical to Windows-1252 (the most common "ANSI" encoding), except that it is missing the latter's characters in the 0x80 - 0x9F code-point range, which notably includes the symbol.

      • To post UTF-8 instead, you have two options:

        • Only if you're not also using a -Header argument: Use a -ContentType argument and append a charset attribute; e.g. -ContentType 'application/json; charset=utf-8'

          • This makes PowerShell automatically encode a [string]-typed -Body argument based on the specified encoding.

          • Note that up to v7.3.x the equivalent use of a Content-Type entry in a hashtable passed to the -Header argument was not honored; this is fixed in v7.4+

        • Alternatively, manually create a [byte[]] array that contains the bytes that make up the UTF-8 encoding of your JSON string and pass it to -Body, as shown at the top.

      • Note that with GET requests, you may also have to prevent misinterpretation of responses; see this answer.

    • In PowerShell (Core) v7.4+, no extra effort is needed UTF-8 is now fortunately the consistent default in both Invoke-WebRequest and Invoke-RestMethod.