I have a string that seems to have a lot of white spaces (actually between each symbols I see a white space). To make it clear this is the string:
{ " d a t a " : { " i d " : " 1 0 b a 8 7 3 8 - b 0 0 9 - 4 1 2 0 - 9 e c 1 - 4 1 7 a 6 e a 1 a 6 1 f " , " t i m e " : 1 4 4 5 2 6 0 9 8 6 7 5 2 } , " e x p i r e s " : 1 4 5 3 0 9 6 7 8 6 7 5 2 }
I try to remove the white spaces in the way I always did:
z = z.replace(" ","")
But it does not work. For example this code:
print type(z), len(z)
z = z.replace(" ","")
print type(z), len(z)
prints the following:
<type 'str'> 198
<type 'str'> 198
So, after the removal of white spaces the string has the same length as before. In addition to that I save the new strings (where the white spaces are supposed to be removed) into a text file. When I open this file with a text editor I do see the white spaces! If I try to remove them withing the text editor (with search and replace) they are removed.
So, my question is why Python cannot remove these "special" white spaces and how to remove them?
I have just tired to use ord(c) and I get 0 for the characters that I have interpreted as white spaces.
It indicates that the input data is utf-16 text. If zero bytes follow what appears to be ascii characters e.g., b'a\0'
then it is 'utf-16le'
(little-endian):
>>> b'd\0a\0t\0a\0'.decode('utf-16le')
u'data'
Don't use .replace(b'\0', b'')
; it will break on the first non-ascii character e.g., b'\xac '
(euro sign encoded using utf-16le character encoding).