Search code examples
bashsortinglogfile

Can I sort with context in bash?


When I want to merge log files, I often use cat logA.log logB.log | sort. As long as the log lines start with some timestamp-like string in a common format, that's fine.

But can I somehow sort the lines and keep lines that do(n't) follow a certain rule glued to their original leading line? Just think of a log file where somebody logged something with linebreaks in it (without me knowing that)!

(berta.log)
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?

(caesar.log)
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
    at Conversation.parseStatement
    at Conversation.considerReplyToStatement
    at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!

These two log files of course would become unusable if merged with cat berta.log caesar.log | sort.

I also am really unsure if I should post this question to StackOverflow or to Superuser or even to Unix or ServerFault...

Edit for clarity

The merged logs should look e.g. like this:

2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
    at Conversation.parseStatement
    at Conversation.considerReplyToStatement
    at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!

Solution

  • Classic problem of mixing lines and files.

    A solution: Put your multiline log lines on one line

    1. Executable script: ./onelinelog.awk
    #! /usr/bin/awk -f
    
    # Timestamp line
    /^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] / {
        if (log_line != "") { print log_line }
        log_line = $0
        next
    }
    # Other line
    {
        # Here, I use '§' for separate each original lines
        log_line = log_line "§" $0
    }
    # End of file
    END {
        if (log_line != "") { print log_line }
    }
    

    Test on caesar.log file:

    $ ./onelinelog.awk caesar.log 
    2021-10-01 00:00:00 Hey Berta
    2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.§    at Conversation.parseStatement§    at Conversation.considerReplyToStatement§    at Conversation.doConversation
    2021-10-01 00:00:40 I am not Adam, I am Caesar!
    
    1. Sort:
    cat <(./onelinelog.awk caesar.log) <(./onelinelog.awk berta.log) | sort
    

    or

    sort <(./onelinelog.awk caesar.log) <(./onelinelog.awk berta.log)
    

    Output:

    2021-10-01 00:00:00 Hey Berta
    2021-10-01 00:00:10 Hey!
    2021-10-01 00:00:11 How are you doing, Adam?
    2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.§    at Conversation.parseStatement§    at Conversation.considerReplyToStatement§    at Conversation.doConversation
    2021-10-01 00:00:40 I am not Adam, I am Caesar!
    

    Fun ?

    You may want to recover your original lines...

    Use sed:

    $ cat and/or sort ... | sed -e 's/§/\n/g'
    

    or another executable awk script: ./tomultilinelog.awk

    #! /usr/bin/awk -f
    BEGIN {
        FS="§"
    }
    {
        for (i = 1; i <= NF; i += 1) { print $i }
    }
    

    So execute:

    $ cat <(./onelinelog.awk caesar.log) <(./onelinelog.awk berta.log) | sort | ./tomultilinelog.awk 
    2021-10-01 00:00:00 Hey Berta
    2021-10-01 00:00:10 Hey!
    2021-10-01 00:00:11 How are you doing, Adam?
    2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
        at Conversation.parseStatement
        at Conversation.considerReplyToStatement
        at Conversation.doConversation
    2021-10-01 00:00:40 I am not Adam, I am Caesar!
    

    Of course, you could adapt the code and replace '§' character with another token.