I have this string from an email I'm scraping:
TICKET\xa0\xa0 STATE\xa0\xa0\xa0\xa0 ACCOUNT IDENTIFIER\xa0\xa0\xa0 FILE DIRECTORY\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 CODE
My objective are the following:
This is my ideal result:
TICKET,STATE,ACCOUNT IDENTIFIER,FILE DIRECTORY
On the other hand, here's what I ended up getting:
#code
my_string.replace(' ', ',').replace('\xa0', '')
#result
TICKET,STATE,ACCOUNT,IDENTIFIER,FILE,DIRECTORY
I was thinking of using regex however, I have no idea how I can implement the logic.
The relevant string separating the items you care about is \xa0
, so you can split on that first and then just keep the elements which contain something other than just whitespace:
my_string = "TICKET\xa0\xa0 STATE\xa0\xa0\xa0\xa0 ACCOUNT IDENTIFIER\xa0\xa0\xa0 FILE DIRECTORY\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 CODE"
print(", ".join(x.strip() for x in my_string.split("\xa0") if x.strip()))
# Output: TICKET, STATE, ACCOUNT IDENTIFIER, FILE DIRECTORY, CODE