I have following log and need to extract the time, hostname using regex(pcre)
2017-05-05T13:03:10.004595+00:00 Section for VMware ESX, abc.hostname.co.uk Vpxa: [fcec63d0] info 'commonvpxLro' opID=host@127454-101-20] [VpxLRO] -- FINISH task-internal-3548957 --- -- vmod1.query.PropertyCollector.Filter.destroy --
2017-05-05T13:04:10.7568945+00:00 abc.hostname.co.uk, Vpxa: [fcec63d0] info 'commonvpxLro' opID=host@89459-13-20] [VpxLRO] -- FINISH task-internal-3548957 --- -- vmod1.query.PropertyCollector.Filter.destroy --
2017-05-05T13:05:10.785895+00:00 Section for VMware ESX, abc.hostname.co.uk Vpxa: [fcec63d0] info 'commonvpxLro' opID=host@12748-101-20] [VpxLRO] -- FINISH task-internal-3548957 --- -- vmod1.query.PropertyCollector.Filter.destroy --
2017-05-05T13:13:11.986532+00:00 Section for VMware ESX, abc.hostname.co.uk Vpxa: [fcec63d0] info 'commonvpxLro' opID=host@12748-101-20] [VpxLRO] -- FINISH task-internal-3548957 --- -- vmod1.query.PropertyCollector.Filter.destroy --
For eg: Timestamp =2017-05-05T13:13:11.986532+00:00 hostname=abc.hostname.co.uk which i need to extract from the above 4 logs using single regex. The tricky part here is every alternate log after the timestamp "Section for VMware ESX," is getting added. Someone told me that I can group it, say timestamp as one capturing group and next group is the hostname. I was able to write a regex for capturing timestamp but how can i create a capturing group for the hostname?
The following works for you example, it captures times in group 1, and hostnames in group 2:
(\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d\.\d{1,7}\+\d\d:\d\d)[^\.]*(\s[\w]*\.[\w]*[\.[\w]*]*)
What does it mean:
\d\d\d\d-\d\d-\d\dT\d\d:\d\d
captures something like 0000-00-00T00:00
.\d{1,7}+\d\d:\d\d
captures something like .0x1-7+00:00
where 0x1-7 means there has to be between 1 and 7 digits
[^.]*
means "string consisting of any combination of characters that are not a .
". Note, that I'm assuming here that the first dot character that will appear after the date is in the host address. If you do not know if it will be the first dot this regex will become more complicated
(\s[\w]*\.[\w]*[\.[\w]*]*)
means space (or tab or enter) then something like text.text.text.text.text
, with the .text
appearing at least 2 times (the first two [\w]*
but there might be as many parts as needed
Try it here: https://regex101.com/r/we04e6/2