I've got some scraped data filled with annoying escape characters:
{"website": "http://www.zebrawebworks.com/zebra/bluetavern/day.cfm?&year=2018&month=7&day=10", "headliner": ["\"Roda Vibe\" with the Tallahassee Choro Society"], "data": [" \r\n ", "\r\n\t\r\n\r\n\t", "\r\n\t\r\n\t\r\n\t", "\r\n\t", "\r\n\t", "\r\n\t", "8:00 PM", "\r\n\t\r\n\tFEE: $2 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ", "\r\n\tEvery 2nd & 4th Tuesday of the month, the Choro Society returns to Blue Tavern with that subtly infectious Brazilian rhythm and beautiful melodies that will stay with you for days. The perfect antidote to Taylor Swift. $2 for musicians; tips appreciated. ", "\r\n\t", "\r\n\t\r\n\t", "\r\n\t", "\r\n\t", "\r\n\t\r\n\t\r\n\r\n\t\r\n\t", "\r\n\t\r\n\t\t", "\r\n", "\r\n", "\r\n", "\r\n"]},
I'm trying to write a function to remove these characters, but neither of my two strategies are working:
# strategy 1
escapes = ''.join([chr(char) for char in range(1, 32)])
table = {ord(char): None for char in escapes}
for item in concert['data']:
item = item.translate(table)
# strategy 2
for item in concert['data']:
for char in item:
char = char.replace("\r", "").replace("\t", "").replace("\n", "")
Why is my data still filled with the escape characters I've tried two different methods to remove?
Consider the following:
lst = ["aaa", "abc", "def"]
for x in lst:
x = x.replace("a","z")
print(lst) # ['aaa', 'abc', 'def']
It appears that the list was unchanged. And it is (unchanged). (Re)assigning to the variable used in your for loop (x
) works inside the loop, but changes are never propagated back to lst
.
Instead:
for (i,x) in enumerate(lst):
lst[i] = x.replace("a","z")
print(lst) # ['zzz', 'zbc', 'def']
Or
for i in range(len(lst)):
lst[i] = lst[i].replace("a","z")
print(lst) # ['zzz', 'zbc', 'def']
Edit
Since you're using assignment (x = ...
), you have to assign back to the original list, using something like lst[i] = ...
.
With immutable types (which includes strings), this is really your only option. x.replace("a","z")
doesn't change x
, it returns a new string with the specified replacements.
With mutable types (e.g. lists), you can perform in-place modification of the iterand (?) object -- the x
in for x in lst:
.
So something like the following will see the changes to x
propagated to lst
.
lst = [[1],[2],[3]]
for x in lst:
x.append('added') # Example of in-place modification
print(lst) # [[1, 'added'], [2, 'added'], [3, 'added']]
As x.append()
(unlike str.replace()
) does change the x
object.