Hi I can't seem to work out how to extract the Date and PID from a log file. I'm trying to display the date and then the pid as shown below. But it will not show the PID only the date.
Please see my code:
def show_time_of_pid(line):
pattern = r"^([\w+]*[\s\d\:]+.[\[(\d+)\]])"
result = re.search(pattern, line)
return result
print(show_time_of_pid("Jul 6 14:01:23 computer.name CRON[29440]: USER (good_user)")) # Jul 6 14:01:23 pid:29440
<re.Match object; span=(0, 14), match='Jul 6 14:01:23'>
I was expecting Jul 6 14:01:23 pid:29440
I get <re.Match object; span=(0, 14), match='Jul 6 14:01:23'> **NO PID DISPLAYED
I would probably write things like this:
def show_time_of_pid(line):
pattern = r"^(\w{3}) \s (\d+) \s ([\d:]+) \s .[^[]+\[(\d+)]:.*"
result = re.search(pattern, line, flags=re.VERBOSE)
return result.groups()
print(show_time_of_pid("Jul 6 14:01:23 computer.name CRON[29440]: USER (good_user)"))
Using re.VERBOSE
lets us split things up to be a little easier to read. Here we have several distinct match groups:
(\w{3})
matches the month name(\d+)
matches the day of the month([\d:]+)
matches the time[^[]+\[(\d+)]
matches the PID ("a bunch of characters that are not [
followed by [
, then a string of digits, then ]
)Each group is separated by whitespace (\s
).
Running the above code produces:
('Jul', '6', '14:01:23', '29440')
You could get fancier with an outer capture group; by writing:
import re
def show_time_of_pid(line):
pattern = r"^((\w{3}) \s (\d+) \s ([\d:]+)) \s .[^[]+\[(\d+)]:.*"
result = re.search(pattern, line, flags=re.VERBOSE)
return result.groups()
print(show_time_of_pid("Jul 6 14:01:23 computer.name CRON[29440]: USER (good_user)"))
We get the entire date string in the first capture group:
('Jul 6 14:01:23', 'Jul', '6', '14:01:23', '29440')
And of course we can get back a labeled dictionary instead of just a list by using named capture groups:
import re
def show_time_of_pid(line):
pattern = r"^(?P<timestamp>(?P<month>\w{3}) \s (?P<day>\d+) \s ([\d:]+)) \s .[^[]+\[(?P<pid>\d+)]:.*"
result = re.search(pattern, line, flags=re.VERBOSE)
return result.groupdict()
print(show_time_of_pid("Jul 6 14:01:23 computer.name CRON[29440]: USER (good_user)"))
Which produces:
{'timestamp': 'Jul 6 14:01:23', 'month': 'Jul', 'day': '6', 'pid': '29440'}