I have a table in Hive structured with 3 columns as follows;
timestamp UserID OtherId
2016-09-01 123 "101","222","321","987","393.1","090","467","863"
2016-09-01 124 "188","389","673","972","193","100","143","210"
2016-09-01 125 "888","120","482","594","393.2"
2016-09-01 126 "441","501","322","671","008","899"
2016-09-01 127 "004","700","393.4","761","467","356","643","578"
2016-09-01 128 "322","582","348"
2016-09-01 129 "029","393.8","126","187"
Where OtherID is an array.
I need to parse OtherID so that the resultant dataset is the following, since I am only interested in values which contain '393%'
timestamp UserID OtherId
2016-09-01 123 393.1
2016-09-01 125 393.2
2016-09-01 127 393.4
2016-09-01 129 393.8
I have researched a ton of parse functions but it seems they're all intended to return the position of the value, or you need to specify the position of the value to return it. Both of these options do not work here because '3309%' can occur at any point in the array for any given row. There's also the fact that I need to incorporate the wildcard to allow for variations of my desired value.
Another option is explode but my table is simply too large for that option.
I'm thinking a UDF might be the only way to go but would welcome some guidance there.
Grateful for any assistance.
It's easy do what you need using the lateral view option available in hive.
0: jdbc:hive2://quickstart:10000/default> select * from test_5;
+-----------+------------+----------------------------------------------+
| test_5.t | test_5.id | test_5.oid |
+-----------+------------+----------------------------------------------+
| 123 | 123 | "222","321","987","393.1","090","467","863" |
+-----------+------------+----------------------------------------------+
And this is the trick:
SELECT id, ooid
FROM test_5
LATERAL VIEW EXPLODE(SPLIT(oid,",")) temp AS ooid;
+------+----------+
| id | ooid |
+------+----------+
| 123 | "222" |
| 123 | "321" |
| 123 | "987" |
| 123 | "393.1" |
| 123 | "090" |
| 123 | "467" |
| 123 | "863" |
+------+----------+
Ergo:
SELECT id, regexp_replace(ooid,'"','')
FROM test_5
LATERAL VIEW EXPLODE(SPLIT(oid,",")) temp AS ooid;
WHERE ooid LIKE '"393%';
+------+----------+
| id | ooid |
+------+----------+
| 123 | 393.1 |
+------+----------+