Search code examples
hadoophiveteradatasqoop

Can the teradata connector sqoop sequencefile format overcome delimiter issues?


If the database contains fields such as "," and "\n" in the fields, is there a way to sqoop to hive without having to fix those delimiters, possibly using alternate formats instead of standard textfile? Have been working with a few workarounds (ie/ replacing delimiters, oreplace, etc).


Solution

  • The solution I've found to overcome this issue on a column-basis for newline characters:

    SELECT 
      COL_A,
      OREPLACE(COL_B, '0A'XC, '_replace_char_'),
      ...,
      COL_N
    FROM
      TABLE_NAME
    

    Assumedly this will work for commas as well. I have yet to test if you can nest this replace char statements. Also no estimate yet on the affects on spool-space usage.

    The solution is found in the first question, not in the 'answers':

    https://community.teradata.com/t5/Database/Removing-a-line-break-character-in-a-column/td-p/52431