I am again probably hoping against hope but while using STRTOK_SPLIT_TO_TABLE on an in-list, where, my in-list is > 32K Character, it gives me the above error
EXPLAIN Failed. 3738: String is longer than 31000 characters
Split the string and used 2 STRTOK_SPLIT_TO_TABLE creating 2 different tables and then joined to my main table
The actual Error translation from TD Error documentation advises to use "USING followed by data parcel". As I understand you can't do all that stuff in UDF So other than split the string, is there any other approach? I have 400K in-list strings that I want to avoid loading to VT because that seems to have its own overhead cost, so using this function, but it never worked.
EXPLAIN
SELECT COL1
FROM DB.TB1 , (
SELECT TOKEN2
FROM TABLE ( STRTOK_SPLIT_TO_TABLE( 1 , '123456,123456,43456,......20K Long string' ,
',' )
RETURNS ( OUTKEY INT , TOKENNUM INT , TOKEN2 VARCHAR ( 20 ) CHARACTER
SET unicode ) ) DD ) D2 ,
(
SELECT TOKEN1
FROM TABLE ( STRTOK_SPLIT_TO_TABLE( 1 , '123456,123456,43456.....20K long string' ,',' )
RETURNS ( OUTKEY INT , TOKENNUM INT , TOKEN1 VARCHAR ( 20 ) CHARACTER
SET unicode ) ) DD ) D1
WHERE SUBSTR ( TB1.COL2 , 21 , 9 ) = d2.token2
or SUBSTR ( TB1.COL2 , 21 , 9 ) = d1.token1
GROUP BY 1 ;
/* the same logic can be re-wrritten
with a UNION or Where EXISTS logic
but focus's on that UDF */
So to summarize ....
I have to FURTHER SPLIT the String length to smaller bits. Two is not enough. How do you harness the UDF features when the string is really that big that its worth running the Split string UDF?
This is the error I get when I use smaller strings. My explain
will run fine, because the UDF's throwing this err
Query Failed. 9134: STRTOK: Input string exceeded the max. allowed string length.
It will NOT let me use LATIN. My Tables are LATIN so it uses translate overhead that causes char set conversion. Is there a work-around here? (Maybe translate the output column from the UDF yourself so the optimizer doesn't do it on the the join column. But that wouldn't matter so much in terms of improving since the optimizer would choose to translate the char set with lesser rows anyway )
BTW: I am on 14.1 and as per below
instring
a character or CLOB argument.
If instring is NULL, the STRTOK_SPLIT_TO_TABLE returns NULL.
From the documentation - I can stick a CLOB for instring. So even AFTER my > 32K string is Cast as CLOB it throws
EXPLAIN Failed. 3738: String is longer than 31000 characters
Finally ....Using character_length ('string in D1') and (' ditto d2') I get 19K. It says varchar (32000 ) limit for instring, so wondered why the reject?
Update : I could CLOB D1 and D2 and limit to a 2 strings approach and sneak out of the 9134 error . But 2 Caveats still exist -
I dont quite follow how the 32K Logic works here when I test it with this
sel Character_length ('string here' )
Returns me a number < 32K so my string should not be subject to the 32K cap
So using this in the smaller component of the query works fine. I can convert the string list to tables , have these join and pull out the ID list that I was interested but when I plug in the component into the bigger report , its freaks out - "segementation fault in Local AMP dont re-submit" .... So I had to adapt another approch.
This is probably a bug in teradata UDF
where inspite of the instring
length being < 32K it will reject it.
sel char_length ('string 32K or less' ) will give o/p
31890
When I use it here
SELECT COL1
FROM DB.TB1 , (
SELECT TOKEN2
FROM TABLE ( STRTOK_SPLIT_TO_TABLE( 1 , '123456,123456,43456,......32K Long string' ,
',' )
RETURNS ( OUTKEY INT , TOKENNUM INT , TOKEN2 VARCHAR ( 20 ) CHARACTER
SET unicode ) ) DD ) D2
It will fail and the failure point is right around half of 32K . It will fail for > 16K characters. But using CLOB makes it work Other than a bug , I can't offer another explanation
SELECT COL1
FROM DB.TB1 , (
SELECT TOKEN2
FROM TABLE ( STRTOK_SPLIT_TO_TABLE( 1 , '123456,123456,43456,......32K Long string' ( CLOB) ,
',' )
RETURNS ( OUTKEY INT , TOKENNUM INT , TOKEN2 VARCHAR ( 20 ) CHARACTER
SET unicode ) ) DD ) D2