A question was asked earlier for the given dataset.
03-24-2014 fm506 TOTAL-PROCESS OK;HARD;1;PROCS OK: 717 processes
03-24-2014 fm504 CHECK-LOAD OK;SOFT;2;OK - load average: 54.61, 56.95
The input regex provided in that thread is not at all working hence I created two "input regex" and tested the first regex in "http://www.regexplanet.com/advanced/java/index.html". The groups are perfect. But when I am trying in Hive, it's loading only NULL values.
input regex I provided as below
([^ ]*)\t+([^ ]*)\t+([^ ]*)\t+([^ ]*)
My second input regex is
^(\\S+)\\t+(\\S+)\\t+(\\S+)\\t+(\\S+)$
I thought it will work but it's also not loading NULL values.
Could you please let me know what's wrong with these two input regex?
Your first pattern does not match the entire string, and field matching parts are [^ ]*
, that is, any 0+ chars other than a space, so the last field cannot be matched (it contains spaces).
The second regex also contains \S+
patterns matching 1 or more chars other than whitespace, and the last one does not match the last field.
You may use
^(\S+)\t+(\S+)\t+(\S+)\t+(.+)
^([^\t]*)\t+([^\t]*)\t+([^\t]*)\t+(.*)
See the regex demo
The [^\t]*
matches any field in a tab-delimited text since it matches zero or more chars other than a tab.