string = "The is a better :: sentence as :: compared to that"
Output:
I've tried the following,
string.split(" :: "),
re.sub("[\<].*?[\>]", "", string)
These won't give me specific words
>>> string = "The is a better :: sentence as :: compared to that"
>>> x = [' '.join(x) for x in map(lambda x: (x[0].split()[-1], x[1].split()[0]), zip(string.split('::')[:-1], string.split('::')[1:]))]
>>> x
Output:
['better sentence', 'as compared']
Disection:
First, split based on ::
and zip group successive matches
pairs = zip(string.split('::')[:-1], string.split('::')[1:]))
If you list()
that expression, you get:
[('The is a better ', ' sentence as '), (' sentence as ', ' compared to that')]
Next, apply a function to extract the last word from the 1st element and the first word from the 2nd element each tuple:
new_pairs = map(lambda x: (x[0].split()[-1], x[1].split()[0]), pairs)
If you list()
that expression, you get:
[('better', 'sentence'), ('as', 'compared')]
Lastly, join each tuple in a list comprehension:
result = [' '.join(x) for x in new_pairs]
Output:
['better sentence', 'as compared']
timeit
results:
The slowest run took 4.92 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.74 µs per loop
Here's another way with re
.
import re
string = "The is a better :: sentence as :: compared to that"
result = [' '.join(x) for x in re.findall('([\w]+) :: ([\w]+)', string)]
Output:
['better sentence', 'as compared']
timeit
results:
The slowest run took 4.60 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.49 µs per loop