This might not be a programming question, but I could not find any answer for it on Google.
I have some text mining task and doing data cleaning at the moment. I have come across some mystery characters far to often which are not in readable format.
These characters are: β
, %
and so on.
All of these starts with a specific pattern and hence I believe they represent some encoding which is not readable to Excel.
Is there any way to convert them? I need to know what exactly these characters mean in order to know if I should remove them or not.
Those are probably Unicode characters written as HTML entities in hexadecimal format.