I'm working with a third-party company and I'm trying/hoping to determine the cause of a character encoding issue before I bring it up with them.
This company has a custom drag and drop editor for designing websites on their platform. Within the editor they have a Raw HTML
widget that I can drag in and add my own content too. The problem is that when I copy HTML from someones old website, using the inspector tool, and paste it into this widget of theirs, all of the apostrophe's & double quotes get replaced with 'jibberish'. I also have the same issue when I try pasting the content into notepad, notepad++, sublime editors and then pasting it into their Raw HTML
editor.
Here's a recording of the issue and a few examples: https://streamable.com/phwn2
Here are the known characters that get replaced and what they get replaced
’ turns into â™
“ turns into âœ
” turns into â
+ turns into (a space)
Å turns into Ã…
" stays as "
' stays as '
Does anyone see a pattern with these characters or know what could be the cause of these characters being replaced?
The website probably has UTF-8 encoding, and the company's editor might be using something like Windows-1252 encoding. In your first example, the right single quote has UTF-8 encoding e2 80 99. When each of those bytes is read by a program using Windows-1252, you get "small latin letter a with circumflex" (e2), [undefined] 80 and "trademark" (99). I haven't checked the other transformations. If this is the problem, then you could do a workaround by first converting the copied characters to the destination encoding with iconv, before pasting into the company's editor.