for example this is my method
import re
text = "ONE:;TWO:,,d,-;THREE:fsdfsd;FOUR:43879293847;FIVE:dsa. dsa, 56;SIX: ;SEVEN:,,,;EIGHT:--;"
def parser(string):
prepare = []
string = list(filter(None, string.split(";")))
for i in string:
s = i.split(":")
j = len(list(filter(None, s)))
if j == 2 and re.match("^[A-Za-z0-9_-]*$",s[1]):
prepare.append(i)
final = ";".join(prepare) + ";"
return final
print(parser(text))
it only returns THREE
,FOUR
and EIGTH
, but I also want to include TWO
and FIVE
and exclude EIGTH
.
May be it is not the best method to approach my goal, but how to include TWO
and FIVE
in it but not SEVEN
and EIGTH
?
Thank you in advance.
For your existing code, you could check if the second part has either a digit or a number using re.search and the character class [A-Za-z0-9]
import re
text = "ONE:;TWO:,,d,-;THREE:fsdfsd;FOUR:43879293847;FIVE:dsa. dsa, 56;SIX: ;SEVEN:,,,;EIGHT:--;"
def parser(string):
prepare = []
string = list(filter(None, string.split(";")))
for i in string:
s = i.split(":")
j = len(list(filter(None, s)))
if j == 2 and re.search("[A-Za-z0-9]", s[1]):
prepare.append(i)
final = ";".join(prepare) + ";"
return final
print(parser(text))
Output
TWO:,,d,-;THREE:fsdfsd;FOUR:43879293847;FIVE:dsa. dsa, 56;
As an alternative with a single regex:
[\w .,-]+:[\w .,-]*[^\W_][\w .,-]*;
Explanation
[\w .,-]+
Match 1+ times any of the listed characters:
Match a colone[\w .,-]*
Match 0+ times any of the listed character[^\W_]
Match a single word character excluding an underscore[\w .,-]*;
Match 0+ times any of the listed character followed by a semicolonSee a regex demo and a Python demo
Example:
import re
text = "ONE:;TWO:,,d,-;THREE:fsdfsd;FOUR:43879293847;FIVE:dsa. dsa, 56;SIX: ;SEVEN:,,,;EIGHT:--;"
regex = re.compile(r"[\w .,-]+:[\w .,-]*[^\W_][\w .,-]*;")
def parser(string):
return "".join(re.findall(regex, string))
print(parser(text))
Output
TWO:,,d,-;THREE:fsdfsd;FOUR:43879293847;FIVE:dsa. dsa, 56;