I have been given a large quantity of Xml's where I need to pull out parts of the text elements and reuse it for other purposes. (I am using XDocument to pull Xml data).
But, how do I decode the text contained in the elements? What is even the formatting used here? A few examples:
"What is the meaning of this® asks Sonny."
"The big centre cost 1¾ million pounds"
"... lost it. ® The next ..."
I have tried HttpUtility.HtmlDecode
but that did not do the trick. If I decode twice the "®" turns into a ® which is obviously not right.
Looks like ® are line breaks. The ® are probably question marks. The 190 one, I don't even know. Perhaps a dot or comma?
Any ideas would be welcome.
It does appear that the strings you show have been HTML encoded, and then XML encoded (or HTML again).
It is correct that ®
-> ®
-> ®
(the registered trademark symbol) per the ISO Latin-1 entities - ®
should behave the same way
Similarly ¾
would turn into a fraction representing three quarters.