My data are in the following format:
data = [['@datumbox', '#machinelearning'],
['@datumbox', '#textanalysis'],
['@things_internet', '#iot'],
['@things_internet', '#h...'],
['@custmrcom', '#analytics123'],
['@custmrcom', '#strategy...123'],
['@custmrcom', '#1knowledgetweet'],
['@tamaradull', '#@bigbrother']]
I would like to check whether the hashtag contains any non-alphabet. If so, the respective rows are removed.
The desired output is:
data = [['@datumbox', '#machinelearning'],
['@datumbox', '#textanalysis'],
['@things_internet', '#iot']]
I think I need to use re.sub (e.g, re.compile('[^a-zA-Z]')). This is what I have so far:
newdata = []
for item in data:
regex = re.compile('[^a-zA-Z]')
if regex.match(item[1]):
newdata.append([item[0], item[1]])
Any suggestion?
Use a list comprehension with a condition:
newdata = [x for x in data if x[1][1:].isalpha()]
print newdata
Gives the output
[['@datumbox', '#machinelearning'], ['@datumbox', '#textanalysis'], ['@things_internet', '#iot']]