Search code examples
pythonstringdelimiter

Find word before and after delimiter


string = "The is a better :: sentence as :: compared to that" 

Output:

  1. better sentence
  2. as compared

I've tried the following,

string.split(" :: "), 
re.sub("[\<].*?[\>]", "", string)

These won't give me specific words


Solution

  • >>> string = "The is a better :: sentence as :: compared to that" 
    >>> x = [' '.join(x) for x in map(lambda x: (x[0].split()[-1], x[1].split()[0]), zip(string.split('::')[:-1], string.split('::')[1:]))]
    >>> x
    

    Output:

    ['better sentence', 'as compared']
    

    Disection:

    First, split based on :: and zip group successive matches

    pairs = zip(string.split('::')[:-1], string.split('::')[1:]))
    

    If you list() that expression, you get:

    [('The is a better ', ' sentence as '), (' sentence as ', ' compared to that')]
    

    Next, apply a function to extract the last word from the 1st element and the first word from the 2nd element each tuple:

    new_pairs = map(lambda x: (x[0].split()[-1], x[1].split()[0]), pairs)
    

    If you list() that expression, you get:

    [('better', 'sentence'), ('as', 'compared')]
    

    Lastly, join each tuple in a list comprehension:

    result = [' '.join(x) for x in new_pairs]
    

    Output:

    ['better sentence', 'as compared']
    

    timeit results:

    The slowest run took 4.92 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 5.74 µs per loop
    

    Here's another way with re.

    import re
    string = "The is a better :: sentence as :: compared to that" 
    result = [' '.join(x) for x in re.findall('([\w]+) :: ([\w]+)', string)]
    

    Output:

    ['better sentence', 'as compared']
    

    timeit results:

    The slowest run took 4.60 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 4.49 µs per loop