I have a directory (/home/myuser/logs
) that contains the following log files for the last 5 days:
applogs_20130402.txt
applogs_20130401.txt
applogs_20130331.txt
applogs_20130330.txt
Each line of every "applog" has the same structure, just different data:
<timestamp> | <fruit> | <color> | <cost>
So for example, applogs_20130402.txt
might look like:
23:41:25 | apple | red | 53
23:41:26 | kiwi | brown | 12
23:41:29 | banana | yellow | 1023
... (etc., every line is pipe delimited like this)
I want to create one "master log" that combines all the log entries (structured, pipe-delimited lines) from all 5 log files into a single file where all timestamps are chronologically ordered. Further, I need the date reflected in the timestamps as well.
So, for instance, if applogs_20130402.txt
and applogs_20130401.txt
were the only 2 applogs in the directory, and they both looked like this respectively:
applogs_20130402.txt:
=====================
23:41:25 | apple | red | 53
23:41:26 | kiwi | brown | 12
23:41:29 | banana | yellow | 1023
applogs_20130401.txt:
=====================
23:40:33 | blueberry | blue | 4
23:41:28 | apple | green | 81
23:45:49 | plumb | purple | 284
Then, I would want a masterlog.txt
file that looks like:
2013-04-01 23:40:33 | blueberry | blue | 4
2013-04-01 23:41:28 | apple | green | 81
2013-04-01 23:45:49 | plumb | purple | 284
2013-04-02 23:41:25 | apple | red | 53
2013-04-02 23:41:26 | kiwi | brown | 12
2013-04-02 23:41:29 | banana | yellow | 1023
I'm on Ubuntu and have access to Bash, python and perl and have no preference which solution is used. Ordinarily I would try a "best attempt" and post it, but I've never dealt with aggregating data like this on Linux. Obviously, the logs are thousands of lines in size, unlike my example above. So doing everything manually isn't an option ;-) Thanks in advance!
You can use Perl from the command line together with sort like this:
perl -n -e 'printf "%d-%02d-%02d %s", $ARGV =~ m/_(\d{4})(\d\d)(\d\d)/, $_;' *.txt | sort -n
Calling perl
with -n
wraps a while (<>) { }
around your program, which in this case is the stuff in -e ''
. In it, we printf
the current line ($_
), and in front of that we put the date from the file name, which is stored in $ARGV
. We use a regex to grab the year, month and day, which are conveniently returned by m//
because of the list context from printf
.
To this program, we pass all txt files in the folder. The result is piped to the command line tool sort, which sorts the lines numerically using the -n flag.