I have two log files with different date/time formats that I would like to merge.
The first file is a standard Apache access_log file like so:
127.0.0.1 - - [29/Feb/2016:16:57:52 -0600] "GET /application/wcs/api/version?nodeRef=workspace://SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0 HTTP/1.1" 200 567
127.0.0.1 - - [29/Feb/2016:16:57:52 -0600] "GET /application/wcs/api/node/workspace/SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0/workflow-instances HTTP/1.1" 200 40
127.0.0.1 - - [29/Feb/2016:16:57:52 -0600] "GET /application/wcs/cisco/appId?userId=abcdefg&requestType=get HTTP/1.1" 200 45
173.37.239.93 - abcdefg [29/Feb/2016:16:57:52 -0600] "GET /share/page/site/nextgen-edcs/document-details?nodeRef=workspace://SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0 HTTP/1.1" 200 124492
173.37.239.93 - abcdefg [29/Feb/2016:16:57:53 -0600] "GET /share/service/messages_69bcdfdb058bb873ff49cc2a10c958b7.js?locale=en_US HTTP/1.1" 200 81698
173.37.239.93 - abcdefg [29/Feb/2016:16:57:53 -0600] "GET /share/res/yui/history/history_543b42a00a378f4d4b6e70c81d915b0a.js HTTP/1.1" 200 5781
. . . where 'abcdedfg' = userid.
The second log file is formatted like so:
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212] Cluster is inactive but put(k,v) was called for cache HazelcastSimpleCache[cacheName=cache.readersSharedCache]
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212] Cluster is inactive but get(key) was called for cache HazelcastSimpleCache[cacheName=cache.readersSharedCache], key=AclEntity[ ID=1893033, version=55, aclId=16cf5bc3-27d0-4d50-a93d-3bee1ddd112e, isLatest=true, aclVersion=1, inherits=true, inheritsFrom=1889292, type=1, inheritedAcl=1893034, isVersioned=false, requiresVersion=false, aclChangeSet=1451473]
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212] Cluster is inactive but put(k,v) was called for cache HazelcastSimpleCache[cacheName=cache.readersSharedCache]
My goals are:
Here is what I have so far --
$LOGFILE1 = "catalina.out";
$LOGFILE2 = "access_log";
open(LOGFILE1) or die("Could not open log file.");
foreach $line (<LOGFILE1>) {
chomp($line);
if ($line =~ /^2016.+$/) {
print $line . "\n";
}
}
open(LOGFILE2) or die("Could not open log file.");
foreach $line (<LOGFILE2>) {
chomp($line);
if ($line =~ /\d{2}\/\S{3}\/\d{4}:\d{2}:\d{2}:\d{2} -\d{3}/) {
print $line . "\n";
}
# format of file 1
# DD/MMM/YYYY:HH:MM:SS -NNNN
# 29/Feb/2016:20:03:07 -600
# format of file 2
# YYYY-MM-DD HH:MM:SS,NNN
# 2016-02-12 08:16:03,631
}
So I'm basically only interested in lines with date/time info, so the above code is discarding the other lines.
Where I'm stuck is:
1) how do I convert the date/time format in file 1 into the data/time format of file 2?
2) I'm not interested in the IP addresses but I do want to keep the userids. Since file 1 does not start with date/time info like file 2, after converting, how will I sort on date after merging the two?
Any help would be appreciated!
Here is a solution using Time::Piece. I used Inline::Files to simulate the 2 files. You would need to open you logfiles like
my $logfile1 = "catalina.out";
my $logfile2 = "access_log";
open my $log1_fh, '<', $logfile1 or die $1;
open my $log2_fh, '<', $logfile2 or die $1;
The program would look like this one, which gave me the results I think you want.
#!/usr/bin/perl
use strict;
use warnings;
use Inline::Files;
use Time::Piece;
my %data;
while (<FILE2>) {
# get date_time
my ($dt) = /^(\d{4}-\d\d-\d\d \d\d:\d\d:\d\d),/ or next;
push @{ $data{$dt} }, $_;
}
my $format = '%d/%b/%Y:%H:%M:%S';
while (<FILE1>) {
/\[(\S+)/;
my $t = Time::Piece->strptime($1, $format)
or die "Cannot parse $1. $!";
my $dt = $t->strftime('%Y-%m-%d %H:%M:%S');
s/^\S+ (?:- )+//;
s/(?<=\[)[^\]]+/$dt/;
push @{ $data{$dt} }, $_;
}
for my $dt (sort keys %data) {
my $aref = $data{$dt};
print for @$aref;
}
__FILE1__
127.0.0.1 - - [29/Feb/2016:16:57:52 -0600] "GET /application/wcs/api/version?nodeRef=workspace://SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0 HTTP/1.1" 200 567
127.0.0.1 - - [29/Feb/2016:16:57:52 -0600] "GET /application/wcs/api/node/workspace/SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0/workflow-instances HTTP/1.1" 200 40
127.0.0.1 - - [29/Feb/2016:16:57:52 -0600] "GET /application/wcs/cisco/appId?userId=abcdefg&requestType=get HTTP/1.1" 200 45
173.37.239.93 - abcdefg [29/Feb/2016:16:57:52 -0600] "GET /share/page/site/nextgen-edcs/document-details?nodeRef=workspace://SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0 HTTP/1.1" 200 124492
173.37.239.93 - abcdefg [29/Feb/2016:16:57:53 -0600] "GET /share/service/messages_69bcdfdb058bb873ff49cc2a10c958b7.js?locale=en_US HTTP/1.1" 200 81698
173.37.239.93 - abcdefg [29/Feb/2016:16:57:53 -0600] "GET /share/res/yui/history/history_543b42a00a378f4d4b6e70c81d915b0a.js HTTP/1.1" 200 5781
__FILE2__
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212] Cluster is inactive but put(k,v) was called for cache HazelcastSimpleCache[cacheName=cache.readersSharedCache]
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212] Cluster is inactive but get(key) was called for cache HazelcastSimpleCache[cacheName=cache.readersSharedCache], key=AclEntity[ ID=1893033, version=55, aclId=16cf5bc3-27d0-4d50-a93d-3bee1ddd112e, isLatest=true, aclVersion=1, inherits=true, inheritsFrom=1889292, type=1, inheritedAcl=1893034, isVersioned=false, requiresVersion=false, aclChangeSet=1451473]
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212] Cluster is inactive but put(k,v) was called for cache HazelcastSimpleCache[cacheName=cache.readersSharedCache]
I used a hash %data
to store the lines. The key is the transformed date so later in the program, you can print them in sorted order.
The output from this program is:
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212] Cluster is inactive but put(k,v) was called for cache HazelcastSimpleCache[cacheName=cache.readersSharedCache]
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212] Cluster is inactive but get(key) was called for cache HazelcastSimpleCache[cacheName=cache.readersSharedCache], key=AclEntity[ ID=1893033, version=55, aclId=16cf5bc3-27d0-4d50-a93d-3bee1ddd112e, isLatest=true, aclVersion=1, inherits=true, inheritsFrom=1889292, type=1, inheritedAcl=1893034, isVersioned=false, requiresVersion=false, aclChangeSet=1451473]
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212] Cluster is inactive but put(k,v) was called for cache HazelcastSimpleCache[cacheName=cache.readersSharedCache]
[2016-02-29 16:57:52] "GET /application/wcs/api/version?nodeRef=workspace://SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0 HTTP/1.1" 200 567
[2016-02-29 16:57:52] "GET /application/wcs/api/node/workspace/SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0/workflow-instances HTTP/1.1" 200 40
[2016-02-29 16:57:52] "GET /application/wcs/cisco/appId?userId=abcdefg&requestType=get HTTP/1.1" 200 45
abcdefg [2016-02-29 16:57:52] "GET /share/page/site/nextgen-edcs/document-details?nodeRef=workspace://SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0 HTTP/1.1" 200 124492
abcdefg [2016-02-29 16:57:53] "GET /share/service/messages_69bcdfdb058bb873ff49cc2a10c958b7.js?locale=en_US HTTP/1.1" 200 81698
abcdefg [2016-02-29 16:57:53] "GET /share/res/yui/history/history_543b42a00a378f4d4b6e70c81d915b0a.js HTTP/1.1" 200 5781