I'm actually working on parsing some logs data and tried to implement a grok parser for spark logs.
Actually, this is one output from spark logs:
14/04/14 18:51:52 INFO Client: Command for the ApplicationMaster: $JAVA_HOME/bin/java -server -Xmx640m -Djava.io.tmpdir=$PWD/tmp org.apache.spark.deploy.yarn.ApplicationMaster --class SimpleApp --jar ./spark-example-1.0.0.jar --args 'yarn-standalone' --worker-memory 1024 --worker-cores 1 --num-workers 3 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
And this is the grok filter I tried before:
(?<logtime>\d{2}/\d{2}/\d{2} \d{2}:\d{2}:\d{2})%{SPACE}%{LOGLEVEL:level}%{SPACE}%{WORD:srcclass}:%{SPACE}%{GREEDYDATA:data}"
This does not work for me. Can someone help me?
Many thanks!
You're almost there. The only issue you have is the double quote at the end of your grok pattern, if you remove it you'll be fine. Also you don't need the %{SPACE}
patterns unless you want to capture those spaces.
This is what worked for me:
(?<logtime>\d{2}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}) %{LOGLEVEL:level} %{WORD:srcclass}: %{GREEDYDATA:data}
It will produce this
{
"logtime": [
[
"14/04/14 18:51:52"
]
],
"level": [
[
"INFO"
]
],
"srcclass": [
[
"Client"
]
],
"data": [
[
"Command for the ApplicationMaster: $JAVA_HOME/bin/java -server -Xmx640m -Djava.io.tmpdir=$PWD/tmp org.apache.spark.deploy.yarn.ApplicationMaster --class SimpleApp --jar ./spark-example-1.0.0.jar --args 'yarn-standalone' --worker-memory 1024 --worker-cores 1 --num-workers 3 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr"
]
]
}