I have an s3 bucket that I structured using the format s3://<bucket-name>/year=<yearno>/month=<monthno>/day=<dayno>/<filename>.log
. The lines in the .log files that I've got is structured like:
2020-01-06 09:05:14,450 INFO [Asterisk-Java DaemonPool-1-thread-3] handler.CallHandler (CallHandler.java:849) - Original name : harris changed to : haris . Exist? true
While the Grok pattern that I'm using for the classifier is:
[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9:,]{12} INFO \[Asterisk-Java DaemonPool-1-thread-[0-9]{1,3}] handler.CallHandler \(CallHandler.java:849\) - Original name : %{WORD:original_name} changed to : %{WORD:transformed_name} . Exist\? %{WORD:exist_prior}
I checked my Grok pattern using this debugger web app, and it's confirmed to be correct. What I expected the resulting table to be:
+------+-------+-----+---------------+------------------+--------------+
| year | month | day | original_name | transformed_name | exists_prior |
+------+-------+-----+---------------+------------------+--------------+
| - | - | - | - | - | - |
+------+-------+-----+---------------+------------------+--------------+
However, the table that I've gotten is:
+------+-------+-----+------+------+------+------+
| year | month | day | col0 | col1 | col2 | col3 |
+------+-------+-----+------+------+------+------+
| - | - | - | - | - | - | - |
+------+-------+-----+------+------+------+------+
Where did I go wrong?
I changed my capture regex from %{WORD:variable_name}
to %{DATA:variable_name}
. It then worked as expected.