I am working with documents of different sources (and also different languages) and I am having a lot of trouble with different definitions of whitespaces.
For instance '\xa0' does no belong to this list of wilipedia Whitespace
I want to replace all of them by ' '. For instance,
text = re.sub(r'\xa0', ' ', text)
U+00A0 is on that Wikipedia page you linked to, in the Unicode list.
I'd say that Unicode.org has the definitive list: http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5Cp%7Bwhitespace%7D