Search code examples
csvhadoophivebigdataflat-file

Removing single quotes from a flat file when loading to Hive


Hey im creating an Hive external table over my flat file data.

The data in my flat file is something like this :

'abc',3,'xyz'

When I load it into the Hive table it shows me the result with the single quotes. But I want it to be something like this :

abc,3,xyz

Is there any way to do this?


Solution

  • I can think of two ways to get desired result.

    1. Use existing String functions available in hive - SUBSTR and LENGTH.
    select SUBSTR("\'abc\'",2,length("\'abc\'")-2) , SUBSTR("\'3\'",2,length("\'3\'")-2) ,  SUBSTR("\'xyz\'",2,length("\'xyz\'")-2)
    

    Generalized query

     select SUBSTR(col1,2,length(col1)-2) , SUBSTR(col2,2,length(col2)-2) ,  SUBSTR(col3,2,length(col3)-2)
    

    NOTE: Hive SUBSTR method expect string index to start from "1" not "0"

    1. Write your own UDF to chop first and last letter of every string.

    How to convert million rows? Lets assume you have a table (named "staging") with 3 columns and 1million record. if you run below query, you will have new table "final" which will not have any single quotes at the start or end.

    INSERT INTO final SELECT SUBSTR(col1,2,length(col1)-2) , SUBSTR(col2,2,length(col2)-2) ,  SUBSTR(col3,2,length(col3)-2) from staging
    

    Once the above query finish job , you will have your desired result in "final" table