While parsing data from a web request, I came across the following string -
dateRange = 'September\xa04,\xa01978 – September 1980'
The encoding of the extracted string seems to be Latin-1 (based on \xa0
). I got rid of that by replacing the codes with spaces.
dateRange = dateRange.replace(u'\xa0', u' ')
Keeping that aside, I can't split the string on the hyphen(-).
When I call split() as follows:
print(dateRange.split('-'))
The output is as follows:
['September\xa04,\xa01978 – September 1980']
It is as if there was no hyphen in the string. I sense that it has something to do with the encoding, but I can't seem to comprehend the issue exactly.
So, how to work around this issue?
EDIT:
I have already tried the following to no avail:
dateRange.split('\-')
That's not an hyphen. That's an U+2013 ᴇɴ ᴅᴀsʜ.
Just copy & paste it into your split call:
dateRange.split('–')
Alternatively, you can replace it with an actual hyphen. Make sure to copy & paste the en dash into the replace call :)