Search code examples
jsonshellunixjqcontrol-characters

Remove escape sequence characters like newline, tab and carriage return from JSON file


I have a JSON with 80+ fields. While extracting the message field in the below mentioned JSON file using jq, I'm getting newline characters and tab spaces. I want to remove the escape sequence characters and I have tried it using sed, but it did not work.

Sample JSON file:

{
"HOSTNAME":"server1.example",
"level":"WARN",
"level_value":30000,
"logger_name":"server1.example.adapter",
"content":{"message":"ERROR LALALLA\nERROR INFO NANANAN\tSOME MORE ERROR INFO\nBABABABABABBA\n BABABABA\t ABABBABAA\n\n BABABABAB\n\n"}
}

Can anyone help me on this?


Solution

  • A pure jq solution:

    $ jq -r '.content.message | gsub("[\\n\\t]"; "")' file.json
    ERROR LALALLAERROR INFO NANANANSOME MORE ERROR INFOBABABABABABBA BABABABA ABABBABAA BABABABAB
    

    If you want to keep the enlosing " characters, omit -r.

    Note: peak's helpful answer contains a generalized regular expression that matches all control characters in the ASCII and Latin-1 Unicode range by way of a Unicode category specifier, \p{Cc}. jq uses the Oniguruma regex engine.


    Other solutions, using an additional utility, such as sed and tr.

    Using sed to unconditionally remove escape sequences \n and t:

    $ jq '.content.message' file.json | sed 's/\\[tn]//g'
    "ERROR LALALLAERROR INFO NANANANSOME MORE ERROR INFOBABABABABABBA BABABABA ABABBABAA BABABABAB"
    

    Note that the enclosing " are still there, however. To remove them, add another substitution to the sed command:

    $ jq '.content.message' file.json | sed 's/\\[tn]//g; s/"\(.*\)"/\1/'
    ERROR LALALLAERROR INFO NANANANSOME MORE ERROR INFOBABABABABABBA BABABABA ABABBABAA BABABABAB
    

    A simpler option that also removes the enclosing " (note: output has no trailing \n):

    $ jq -r '.content.message' file.json | tr -d '\n\t'
    ERROR LALALLAERROR INFO NANANANSOME MORE ERROR INFOBABABABABABBA BABABABA ABABBABAA BABABABAB
    

    Note how -r is used to make jq interpolate the string (expanding the \n and \t sequences), which are then removed - as literals - by tr.