I am trying to parse a string which is :
"297","298","Y","","299"
using Regexp serder but i am unable to do so. The Table definition i have created is :
create external table test.test1
(a string,
b string,
c string,
d string)
row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'
with serdeproperties ("input.regex" = "\"\"|\"([^\"]+)\"")
the regex used in the serde properties looks promising in the regexp test websites but i am getting exception while trying to read the table kindly help me out in this. I know that this can be easily done using csv serde but i am trying to figure out a bigger part of the problem for which i have to use the regexp serde Thanks
In the regex it should be capturing group per column. Your data contains 5 columns and table 4, you want to skip one column, right?
For example this regex will work: with serdeproperties ('input.regex' = '^"(.*?)","(.*?)","(.*?)",.*?,"(.*?)"$')
You can easily check without creating table, like this:
select regexp_replace('"297","298","Y","","299"','^"(.*?)","(.*?)","(.*?)",.*?,"(.*?)"$','$1|$2|$3|$4');
OK
_c0
297|298|Y|299
select regexp_replace('"297","298","Y","this column is skipped","299"','^"(.*?)","(.*?)","(.*?)",.*?,"(.*?)"$','$1|$2|$3|$4');
OK
_c0
297|298|Y|299