Search code examples
pythonstringsubstring

Best practice when substring is missing from string


I'm extracting data from an API and one of the fields is a string from which i want to extract multiple substrings(7 ideally). To get those substring I'm using the index() method.

string = r"""[Summary]
Reason: Not enough information
Improvements_Done: None
Improvements_Planned: Documentation
References_Improvements_Done: None
References_Improvements_Done: None
References_Improvements_Planned: www.link1.com
References_Improvements_Planned: www.link2.com
 *** DEFAULT.....""".replace("\n", "\r\n")

Ex: imp_done_start = string.index('Improvements Done: ') + len('Improvements Done: ')
    imp_done_end = string.index('Improvements_Planned')
    imp_done = string[imp_done_start:imp_done_end]

There could be cases when one or more of these substrings(Reason ,Improvements_Done, Improvements_Planned etc) could be missing from the string. For example if "Improvements_Planned" is missing then i can't get the value for imp_done.

What is the best practice to handle these kind of cases?


Solution

  • The best practice depends largely on the format. However, in most cases, you can adopt a flexible approach and convert to an easier to parse/analyze intermediate representation:

    import re
    
    def parse(s: str) -> dict[str, str]:
        d = {}
        lines = s.splitlines()
    
        for line in lines[1:-1]:
            pattern = r"^(.*)?: (.*)$"
            m = re.match(pattern, line)
            if m is None:
                continue
            d[m.group(1)] = m.group(2)
    
        return d
    

    Usage:

    >>> parse(string)
    {'Improvements_Done': 'None',
     'Improvements_Planned': 'Documentation',
     'Reason': 'Not enough information',
     'References_Improvements_Done': 'None',
     'References_Improvements_Planned': 'www.link2.com'}
    

    Now further analyse the result with any further rules required.