I am trying to execute this query:
add FILE /home/user1/test/test_udf.py;
SELECT a.hash_code, col2
FROM (SELECT transform (col2, col3) using 'python test_udf.py' as hash_code, col2
FROM sample_table) a ;
I am able to successfully generate the hash_code using the udf but the other column (col2) is getting populated as NULL
.
Sample Output:
sjhfshhalksjlkfj128798172jasjhas NULL
ajsdlkja982988290819189089089889 NULL
jhsad817982mnsandkjsahj982398290 NULL
I know what's wrong with your HiveSql.
In transform (col2, col3) using 'python test_udf.py' as hash_code, col2 FROM sample_table
, the hash_code, col2
's value is parsed from transform (col2, col3)
's return value.
The clo2
is parsed from transform (col2, col3)
, which is NULL
.
I read the Transform doc, picked up the related info as below.
Transform/Map-Reduce Syntax
SELECT TRANSFORM '(' expression (',' expression)* ')'
(inRowFormat)?
USING 'my_reduce_script'
( AS colName (',' colName)* )?
(outRowFormat)? (outRecordReader)?
You'd better not mix transform
with other select
, as the syntax won't support.
Update:
There is a hack to do what you want: let the test_udf.py
return hash_code\t col2
.
So you can parse hash_code, col2
from it. This would solve your problem.