Search code examples
pythonregexlistnlp

Find first match backwards in a list of lists


I have the following list:

[
['the', 'the +Det'],
['dog', 'dog +N +A-right'],
['ran', 'run +V +past'],
['at', 'at +P'], 
['me', 'I +N +G-left'],
['and', 'and +Cnj'],
['the', 'the +Det'],
['ball', 'ball +N +G-right'],
['was', 'was +C'],
['kicked', 'kick +V +past']
['by', 'by +P']
['me', 'I +N +A-left']

]

Basically, what I'm looking to do is:

  1. iterate through the list of lists
  2. find all instances of +G-left, +A-left, +G-right, and +A-right
  3. if +G-left or +A-left is seen, look backward to the first instance of a list with the element +V add the first index of the list containing +G-left or +A-left to the end of the list containing +V with the +G-left or +A-left tag, then move on and repeat
  4. if +G-right or +A-right is seen, look forward to the first instance of a list with the element +V add the first index of the list containing +G-right or +A-right to the end of the list containing +V with the +G-right or +A-right tag, then move on and repeat

So in the case of my above example, the desired states would be:

[
['the', 'the +Det'],
['dog', 'dog +N +A-right'],
['ran', 'run +V +past', 'dog+A-right', 'me+G-left'],
['at', 'at +P'], 
['me', 'I +N +G-left'],
['and', 'and +Cnj'],
['the', 'the +Det'],
['ball', 'ball +N +G-right'],
['was', 'was +C'],
['kicked', 'kick +V +past', 'ball+G-right', 'me+A-left']
['by', 'by +P']
['me', 'I +N +A-left']
]

I think the proper way to approach this is with re, so:

gleft = re.compile(r"G-left")
gright = re.compile(r"G-right")
aleft = re.compile(r"A-left")
aright = re.compile(r"A-right")

then something like

for item in list:
    if aleft.match(item[1]):
        somehow work backwards to find the +V tag
            whatever.insert(-1, item[0]) #can you concatenate a string here to add +A-left

    if aright.match(item[1]):
        somehow work forwards to find the +V tag
            whatever.insert(-1, item[0]) #can you concatenate a string here to add +A-right

And the same thing but with the G tags.

Hopefully someone can help point me in the right direction. I believe I've broken down the steps correctly, I just am not familiar enough with Python to yet know the syntax for this off the top of my head.


Solution

  • This can probably somewhat simplified by using an auxiliary function, but that aside, try this, which doesn't require regex:

    wls = [your list of lists, above, fixed (some commas are missing)]
    for wl in wls:
        for w in wl:
            if '-right' in w:                        
                targ = wls.index(wl)            
                counter = 0
                for wt in (wls[targ+1:]):                               
                    for t in wt:
                        if '+V' in t:
                            if counter<1:                            
                                wt.insert(len(wt),wl[0]+w.split(' ')[-1])
                            counter+=1
    
            if '-left' in w:            
                targ = wls.index(wl)            
                counter = 0
                revd = [item for item in reversed(wls[:targ])]
                for wt in revd:           
                    for t in wt:
                        if '+V' in t:
                            if counter<1:
                                wt.insert(len(wt),wl[0]+w.split(' ')[-1])
                            counter+=1
               
    wls
    

    The output should be what you are looking for.