I have the following string:
str1 = "I/TAG1 like/TAG2 red/TAG3 apples/TAG3 ./TAG4"
And I have two lists in python
tokens = []
tags = []
My desired output would be:
tokens = ['I', 'like', 'red', 'apples', '.']
tags = ['TAG1', 'TAG2', 'TAG3', 'TAG3', 'TAG4']
I am trying to use a regexp like this one:
r"\w*\/"
But that extracts the words with the slash, i.e I/. How can I get the desired output, at least for tokens (get everything before the /)?
You can use:
>>> re.findall(r'([\w.]+)/([\w.]+)', str1)
[('I', 'TAG1'), ('like', 'TAG2'), ('red', 'TAG3'), ('apples', 'TAG3'), ('.', 'TAG4')]
Code:
>>> tags=[]
>>> vals=[]
>>> for m in re.findall(r'([\w.]+)/([\w.]+)', str1):
... tags.append(m[0])
... vals.append(m[1])
...
>>> print tags
['I', 'like', 'red', 'apples', '.']
>>> print vals
['TAG1', 'TAG2', 'TAG3', 'TAG3', 'TAG4']