Search code examples
pythonpython-3.xdictionaryfor-loopdictionary-comprehension

Select i+4 elements from a python dictionary based on a keyword


I have a Python dictionary as follows:

ip_dict = {
  "doc_1": "ADMINISTRATION LIABILITY COVERAGE PART CG7023 1096 EXCL-ASBESTOS",
  "doc_2": "DIRECT BILL L7F6 20118 INSURED COPY ACP GLDO 7285650787 919705952 43 0001404",
  "doc_3": "What Contractor Additional Insured LIABILITY CG 20 10 04 13 THIS ENDORSEMENT CHANGES",
  "doc_4": "That portion of \"your work\" out of which the 1. Required by the contract or agreement",
  "doc_5": "LIABILITY CG 20 10 04 13 Contractor Additional Insured THIS ENDORSEMENT CHANGES THE POLICY",
  "doc_6": "That portion of \"your work\" out of which the 1. Contractor Additional Insured Required",
  "doc_7": "LIABILITY CG 20 26 04 13 THIS ENDORSEMENT CHANGES THE POLICY.",
  "doc_8": "COMMERCIAL GENERAL LIABILITY CG 21 87 0115 THIS ENDORSEMENT CHANGES THE POLICY.",
  "doc_9": "Page 2 of 2 ACP GLDO7285650787 L7F6 20118 CG 21 87 01 15 B. The following definitions are added",
  "doc_10": "POLICY NUMBER: THIS ENDORSEMENT CHANGES THE POLICY. COMMERCIAL GENERAL LIABILITY CG 25 03 05 09 ",
  "doc_11": "Page 2 of 2 ACP GLDO7285650787 L7F6 20118 CG 25 03 05 09 B"
}

Now I want search for the keyword Contractor Additional Insured in the values and if found then extract that element plus the next 4 consecutive elements appearing after that element and store in a new dictionary. So my output would look something like this:

op_dict = {
"doc_3": "What Contractor Additional Insured LIABILITY CG 20 10 04 13 THIS ENDORSEMENT CHANGES",
"doc_4": "That portion of \"your work\" out of which the 1. Required by the contract or agreement",
"doc_5": "LIABILITY CG 20 10 04 13 Contractor Additional Insured THIS ENDORSEMENT CHANGES THE POLICY",
"doc_6": "That portion of \"your work\" out of which the 1. Contractor Additional Insured Required",
"doc_7": "LIABILITY CG 20 26 04 13 THIS ENDORSEMENT CHANGES THE POLICY.",
"doc_8": "COMMERCIAL GENERAL LIABILITY CG 21 87 0115 THIS ENDORSEMENT CHANGES THE POLICY.",
"doc_9": "Page 2 of 2 ACP GLDO7285650787 L7F6 20118 CG 21 87 01 15 B. The following definitions are added",
"doc_10": "POLICY NUMBER: THIS ENDORSEMENT CHANGES THE POLICY. COMMERCIAL GENERAL LIABILITY CG 25 03 05 09 ",
}

Here the keyword appears in the third element doc_3, so we consider 4 elements after doc_3 i.e. doc_4, doc_5, doc_6, doc_7. Hence elements till doc_7 will be considered.

Now next the keyword appears in doc_5. Hence 4 elements after doc_5 (which are doc_6, doc_7, doc_8, doc_9).

Similarly next the keyword appears in doc_6 so the next 4 consecutive elements will be selected (doc_7, doc_8, doc_9, doc_10).

Any help is appreciated!


Solution

  • Let's convert your dict to a list of tuples with indices.

    >>> lst = list(enumerate(ip_dict.items()))
    >>> lst
    [(0, ('doc_1', 'ADMINISTRATION LIABILITY COVERAGE PART CG7023 1096 EXCL-ASBESTOS')), 
     (1, ('doc_2', 'DIRECT BILL L7F6 20118 INSURED COPY ACP GLDO 7285650787 919705952 43 0001404')), 
     (2, ('doc_3', 'What Contractor Additional Insured LIABILITY CG 20 10 04 13 THIS ENDORSEMENT CHANGES')), 
     (3, ('doc_4', 'That portion of "your work" out of which the 1. Required by the contract or agreement')), 
     (4, ('doc_5', 'LIABILITY CG 20 10 04 13 Contractor Additional Insured THIS ENDORSEMENT CHANGES THE POLICY')), 
     (5, ('doc_6', 'That portion of "your work" out of which the 1. Contractor Additional Insured Required')), 
     (6, ('doc_7', 'LIABILITY CG 20 26 04 13 THIS ENDORSEMENT CHANGES THE POLICY.')), 
     (7, ('doc_8', 'COMMERCIAL GENERAL LIABILITY CG 21 87 0115 THIS ENDORSEMENT CHANGES THE POLICY.')), 
     (8, ('doc_9', 'Page 2 of 2 ACP GLDO7285650787 L7F6 20118 CG 21 87 01 15 B. The following definitions are added')), 
     (9, ('doc_10', 'POLICY NUMBER: THIS ENDORSEMENT CHANGES THE POLICY. COMMERCIAL GENERAL LIABILITY CG 25 03 05 09 ')), 
     (10, ('doc_11', 'Page 2 of 2 ACP GLDO7285650787 L7F6 20118 CG 25 03 05 09 B'))]
    

    Now, get all indices where the keyword is found.

    >>> idxs = [i for i, x in lst if 'Contractor Additional Insured' in x[1]]
    >>> idxs
    [2, 4, 5]
    

    Now we can use a set comprehension to get the indices within 4 elements of each index.

    >>> {j 
    ...  for i in idxs 
    ...  for j in range(i, i+5)}
    {2, 3, 4, 5, 6, 7, 8, 9}
    

    And then a dictionary comprehension over lst checking for membership in that set.

    >>> {v[0]: v[1] 
    ...  for i, v in lst 
    ...  if i in {j for i in idxs for j in range(i, i+4)}}
    {'doc_3': 'What Contractor Additional Insured LIABILITY CG 20 10 04 13 THIS ENDORSEMENT CHANGES', 
     'doc_4': 'That portion of "your work" out of which the 1. Required by the contract or agreement',
     'doc_5': 'LIABILITY CG 20 10 04 13 Contractor Additional Insured THIS ENDORSEMENT CHANGES THE POLICY',
     'doc_6': 'That portion of "your work" out of which the 1. Contractor Additional Insured Required',
     'doc_7': 'LIABILITY CG 20 26 04 13 THIS ENDORSEMENT CHANGES THE POLICY.',
     'doc_8': 'COMMERCIAL GENERAL LIABILITY CG 21 87 0115 THIS ENDORSEMENT CHANGES THE POLICY.',
     'doc_9': 'Page 2 of 2 ACP GLDO7285650787 L7F6 20118 CG 21 87 01 15 B. The following definitions are added',
     'doc_10': 'POLICY NUMBER: THIS ENDORSEMENT CHANGES THE POLICY. COMMERCIAL GENERAL LIABILITY CG 25 03 05 09 '}