Search code examples
pythonlinuxlinux-mint

Isolating parts of String from terminal output


I'm using Python3, Linux Mint and Visual Studio Code.

I have some code that reads a directory and prints some xml files like so:

persistence_security_dcshadow_4742.xml
Network_Service_Guest_added_to_admins_4732.xml
spoolsample_5145.xml
LM_Remote_Service02_7045.xml
DE_RDP_Tunneling_4624.xml

I'm trying to figure out how to write so that only the integers remain after I have run this read script, i.e., removing all text with only numbers remaining. I tried using regular expression with use of the import re module but didn't have much luck.


Solution

  • Use regex with [0-9]

    import re
    
    regex = r'[0-9]+'
    
    xmls = [
        'persistence_security_dcshadow_4742.xml',
        'Network_Service_Guest_added_to_admins_4732.xml',
        'spoolsample_5145.xml',
        'LM_Remote_Service02_7045.xml',
        'DE_RDP_Tunneling_4624.xml',
    ]
    
    for xml in xmls:
        matches = re.findall(regex, xml)
        number = matches[-1]
        print(number)
    
    > 4742
    > 4732
    > 5145
    > 7045
    > 4624
    

    UPDATE

    If you want to print the numbers only after all the files have been read, then you can create a function that takes a list of xml files and returns the corresponding number for each file

    import re
    
    def xmls_to_numbers(xmls):
        regex = r'[0-9]+'
        numbers = [ ]
        for xml in xmls:
            matches = re.findall(regex, xml)
            number = matches[-1]
            numbers.append(number)
        return numbers
    
    
    xmls = [
        'persistence_security_dcshadow_4742.xml',
        'Network_Service_Guest_added_to_admins_4732.xml',
        'spoolsample_5145.xml',
        'LM_Remote_Service02_7045.xml',
        'DE_RDP_Tunneling_4624.xml',
    ]
    
    print(xmls_to_numbers(xmls))
    

    > ['4742', '4732', '5145', '7045', '4624']