Search code examples
pythonpattern-matchingstructural-pattern-matchingoption-type

Safe indexing with pattern matching in Python


I have a giant list of words corpus and a particular word w. I know the index of every occurence of w in the corpus. I want to look at an n sized window around every occurence of w and create a dictionary of other words that occur within that window. The dictionary is a mapping from int to list[str] where the key is how many positions away from my target word I am, either to the left (negative) or to the right (positive), and the value is a list of words at that position.

For example, if I have the corpus: ["I", "made", "burgers", "Jack", "made", "sushi"]; my word is "made" and I am looking at a window of size 1, then I ultimately want to return {-1: ["I", "Jack"], 1: ["burgers", "sushi"]}.

There are two problems that can occur. My window may go out of bounds (if I looked at a window of size 2 in the above example) and I can encounter the same word multiple times in that window, which are cases I want to ignore. I have written the following code which seems to work, but I want to make this cleaner.

def find_neighbor(word: str, corpus: list[str], n: int = 1) -> dict[int, list[str]]:
    mapping = {k: [] for k in list(range(-n,n+1)) if k != 0}
    idxs = [k for k, v in enumerate(corpus) if v == word]
    for idx in idxs:
        for i in [x for x in range(-n,n+1) if x != 0]:
            try:
                item = corpus[idx+i]
                if item != word:
                    mapping[i].append(corpus[item])
            except IndexError:
                continue
    return mapping

Is there a way to incorporate options and pattern matching so that I can remove the try block and have something like this...

match corpus[idx+i] 
  case None: continue; # If it doesn't exist (out of bounds), continue / i can also break
  case word: continue; # If it is the word itself, continue
  case _: mapping[i].append(corpus[item]) # Otherwise, add it to the dictionary

Solution

  • Introduce a helper function that returns corpus[i] if i is a legal index and None otherwise:

    corpus = ["foo", "bar", "baz"]
    
    def get(i):
        return corpus[i] if i<len(corpus) else None
            
    print([get(0), get(1), get(2), get(3)])
    

    The result of the above is:

    ['foo', 'bar', 'baz', None]
    

    Now you can write:

    match get(idx+i)
      case None: something
      case word: something
      case _:    something