How to keep only strings which follows a specific form in a list (Python)

i have a corpus text extracted from pdf file defined in this list below

list=["7.1 PLAN COST MANAGEMENT",'Plan Cost Management is the process of defining how the project costs will be estimated','7.1.1 PLAN COST MANAGEMENT: INPUTS','Described in Section 4.2.3.1. The project charter provides the preapproved financial ','7.1.1.1 PROJECT CHARTER']

However , i wanted to extract only the titles found in this list which owns a specific form as shown in the example [(d.d.d.d + upper case title) or (d.d.d + upper case title) or (d.d + upper case title)] & getting rid of the rest. I don't really know how to encounter this properly. Any help is appreciated

Solution

This is a perfect use case for regular expressions. Here's some code to do what you're asking:

import re

list = ["7.1 PLAN COST MANAGEMENT",
        'Plan Cost Management is the process of defining how the project costs will be estimated',
        '7.1.1 PLAN COST MANAGEMENT: INPUTS',
        'Described in Section 4.2.3.1. The project charter provides the preapproved financial ',
        '7.1.1.1 PROJECT CHARTER']

exp = re.compile(r"(\d+(\.\d+){1,3}) +([A-Z :]+)")

for x in list:
    m = exp.match(x)
    if m:
        print(m.group(0))

Result:

7.1 PLAN COST MANAGEMENT
7.1.1 PLAN COST MANAGEMENT: INPUTS
7.1.1.1 PROJECT CHARTER

You weren't clear about what constitutes a valid "upper case title". This solution assumes that the ':' character and whitespace are valid characters in a title. You can adjust what's inside the square braces in the expression to tweak what you do or do not want to consider valid characters in titles.