I have 2 tables tabl1
:
+-------+--------+--------+----------+
| att1 | att2 | att3 | att4 |
+-------+--------+--------+----------+
| abcd | ava012 | df012f | afsdaldf |
.......
and tabl2
:
+----+
| val|
+----+
| 012|
...
tabl2
contains numbers that can be substring in one or more of the 4 columns of tabl1
.
The both tables are big tables contains millions of records.
I tried to concatenate tabl1
columns and search in it but the query never ends.
Is there an efficient way to do it. Maybe to transform the whole table into one txt
file and search in it?
Following also this question
Here are some examples of my trials (both in Hive):
SELECT a.*, b.*
from tabl1 a, tabl2 b
where
instr (
concat ( (cast (a.att1 as string), (cast (a.att2 as string),
(cast (a.att3 as string), (cast (a.att4 as string) ) , (cast (b.val as string) ) ) > 0
or
SELECT a.*, b.*
from tabl1 a, tabl2 b
where
concat ( (cast (a.att1 as string), (cast (a.att2 as string),
(cast (a.att3 as string), (cast (a.att4 as string) )
like concat ('%',(cast (b.val as string),'%')
There are some more with REGEX
but endless runtime...
select *
from (select *
from tabl1 t1
lateral view explode(split(regexp_replace(trim(regexp_replace(concat_ws(',',att1,att2,att3,att4),'\\D+',' ')),'(?<=^| )(?<token>.*?) (?=.*(?<= )\\k<token>(?= |$))',''),' ')) e as val
) t1
join tabl2 t2
on t2.val =
t1.val