I'm learning programming as best as I can, and I've been starting with Python. I am currently writing an IRC statistics generator (as if there weren't enough of those already), and I am trying to come up with a regex that matches the username (and only the username) in a particular log format. However, the one I have doesn't match anything with re.search.
Here is an example of the log format:
may 01 14:04:54 <FishCream> Wahoo!
may 01 14:05:01 <LpSamuelm> Oh, if only talking was this fun in real life.
jan 01 00:00:00 <Username> Message goes here.
jan 01 00:00:00 * Username Action goes here.
Here are the compile statements:
findusername = re.compile("^[a-zA-Z]+\s[0-9]+\s[0-9:]\s<([A-Za-z]+)>")
finduseraction = re.compile("^[a-zA-Z]+\s[0-9]+\s[0-9:]\s\*\s+([A-Za-z]+)\s")
As you can see, I have made two separate statements for finding the username when the user talks and when they use /me commands - making one super-regex for these two is probably possible, but I've got enough headache as it is.
Can anyone help me identify the problem?
Your [0-9:]
class only matches one character, not the 8 that are there; add a quantifier:
findusername = re.compile("^[a-zA-Z]+\s[0-9]+\s[0-9:]{8}\s<([A-Za-z]+)>")
finduseraction = re.compile("^[a-zA-Z]+\s[0-9]+\s[0-9:]{8}\s\*\s+([A-Za-z]+)\s")
This presumes that you have each time entry on a separate line; add the re.MULTILINE
flag if your log text comprises of multiple lines at a time.
A demo using the re.MULTILINE
flag with .findall()
on your input example:
>>> findusername = re.compile("^[a-zA-Z]+\s[0-9]+\s[0-9:]{8}\s<([A-Za-z]+)>", re.MULTILINE)
>>> finduseraction = re.compile("^[a-zA-Z]+\s[0-9]+\s[0-9:]{8}\s\*\s+([A-Za-z]+)\s", re.MULTILINE)
>>> findusername.findall(logs)
['FishCream', 'LpSamuelm', 'Username']
>>> finduseraction.findall(logs)
['Username']