Search code examples
stringperformancehivehiveqlvarchar

String vs Varchar Hive Query Performance


I have this table with 5 mill records, around 25 columns and most of them are String type. When I make a query, it lasts around 47 sec to fetch the results. I've 2 GB of space for each String column (because I dont know how to reduce that value), the max length record is just around 32k characters for one column, the other ones have way more less than that (7,18,50).

To get a better query performance, I copied that table, but instead of String, I used Varchar(1000) and varchar(50000) for that long record mentioned above, in all STring columns. I thought this would get me a faster fetch, but it takes almost the double of the time.

As my understanding, im using way more less space using varchar, but somehow this is not happening. Under the same conditions, should I get a better response using varchar instead of string?


Solution

  • There should not be any performance difference between string and varchar but best option is used as string, varchar is also stored internally as string.

    Here are some excellent thread on detail comparison for both:

    https://community.hortonworks.com/questions/48260/hive-string-vs-varchar-performance.html

    Hive - Varchar vs String , Is there any advantage if the storage format is Parquet file format