Search code examples
linuxbashsyslog

Arrange Log Entries into Dated Files


I'm trying to split a large log file, containing log entries for months at a time, and I'm trying to split it up into logfiles by date. There are thousands of line as follows:

Sep 4 11:45 kernel: Entry
Sep 5 08:44 syslog: Entry

I'm trying to split it up so that the files, logfile.20090904 and logfile.20090905 contain the entries.

I've created a program to read each line, and send it to the appropriate file, but it runs pretty slow (especially since I have to turn a month name to a number). I've thought about doing a grep for every day, which would require finding the first date in the file, but that seems slow as well.

Is there a more optimal solution? Maybe I'm missing a command line program that would work better.

Here is my current solution:

#! /bin/bash
cat $FILE | while read line; do
  dts="${line:0:6}"
  dt="`date -d "$dts" +'%Y%m%d'`"
  # Note that I could do some caching here of the date, assuming
  # that dates are together.
  echo $line >> $FILE.$dt 2> /dev/null
done

Solution

  • @OP try not to use bash's while read loop to iterate a big file. Its tried and proven that its slow, and furthermore, you are calling external date command for every line of the file you read. Here's a more efficient way, using only gawk

    gawk 'BEGIN{
     m=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",mth,"|")     
    }
    { 
     for(i=1;i<=m;i++){ if ( mth[i]==$1){ month = i } }
     tt="2009 "month" "$2" 00 00 00" 
     date= strftime("%Y%m%d",mktime(tt))
     print $0 > FILENAME"."date
    }
    ' logfile
    

    output

    $ more logfile
    Sep 4 11:45 kernel: Entry
    Sep 5 08:44 syslog: Entry
    
    $ ./shell.sh
    
    $ ls -1 logfile.*
    logfile.20090904
    logfile.20090905
    
    $ more logfile.20090904
    Sep 4 11:45 kernel: Entry
    
    $ more logfile.20090905
    Sep 5 08:44 syslog: Entry