Search code examples
python-3.xpython-re

How to match letters, digits, dashes, semicolons, dots and commas together but not if without digits and letters?


for example this is my method

import re
text = "ONE:;TWO:,,d,-;THREE:fsdfsd;FOUR:43879293847;FIVE:dsa. dsa, 56;SIX: ;SEVEN:,,,;EIGHT:--;"

def parser(string):
    prepare = []
    string = list(filter(None, string.split(";")))
    for i in string:
        s = i.split(":")
        j = len(list(filter(None, s)))
        if j == 2 and re.match("^[A-Za-z0-9_-]*$",s[1]):
            prepare.append(i)

    final = ";".join(prepare) + ";"
    return final
        
print(parser(text))

it only returns THREE,FOUR and EIGTH, but I also want to include TWO and FIVE and exclude EIGTH. May be it is not the best method to approach my goal, but how to include TWO and FIVE in it but not SEVEN and EIGTH?

Thank you in advance.


Solution

  • For your existing code, you could check if the second part has either a digit or a number using re.search and the character class [A-Za-z0-9]

    import re
    
    text = "ONE:;TWO:,,d,-;THREE:fsdfsd;FOUR:43879293847;FIVE:dsa. dsa, 56;SIX: ;SEVEN:,,,;EIGHT:--;"
    
    
    def parser(string):
        prepare = []
        string = list(filter(None, string.split(";")))
        for i in string:
            s = i.split(":")
            j = len(list(filter(None, s)))
            if j == 2 and re.search("[A-Za-z0-9]", s[1]):
                prepare.append(i)
    
        final = ";".join(prepare) + ";"
        return final
    
    
    print(parser(text))
    

    Output

    TWO:,,d,-;THREE:fsdfsd;FOUR:43879293847;FIVE:dsa. dsa, 56;
    

    As an alternative with a single regex:

    [\w .,-]+:[\w .,-]*[^\W_][\w .,-]*;
    

    Explanation

    • [\w .,-]+ Match 1+ times any of the listed characters
    • : Match a colone
    • [\w .,-]* Match 0+ times any of the listed character
    • [^\W_] Match a single word character excluding an underscore
    • [\w .,-]*; Match 0+ times any of the listed character followed by a semicolon

    See a regex demo and a Python demo

    Example:

    import re
    
    text = "ONE:;TWO:,,d,-;THREE:fsdfsd;FOUR:43879293847;FIVE:dsa. dsa, 56;SIX: ;SEVEN:,,,;EIGHT:--;"
    regex = re.compile(r"[\w .,-]+:[\w .,-]*[^\W_][\w .,-]*;")
    
    
    def parser(string):
        return "".join(re.findall(regex, string))
    
    
    print(parser(text))
    

    Output

    TWO:,,d,-;THREE:fsdfsd;FOUR:43879293847;FIVE:dsa. dsa, 56;