Search code examples
hadoophiveavro

Issue Hive AvroSerDe tblProperties max length


I try to create a table with AvroSerDe. I have already tried following command to create the table:

CREATE EXTERNAL TABLE gaSession
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED AS
 INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 TBLPROPERTIES ('avro.schema.url'='hdfs://<<url>>:<<port>>/<<path>>/<<file>>.avsc');

The creation seems to work, but following table is generated:

hive> show create table gaSession;
OK
CREATE EXTERNAL TABLE `gaSession`(
  `error_error_error_error_error_error_error` string COMMENT 'from deserializer',
  `cannot_determine_schema` string COMMENT 'from deserializer',
  `check` string COMMENT 'from deserializer',
  `schema` string COMMENT 'from deserializer',
  `url` string COMMENT 'from deserializer',
  `and` string COMMENT 'from deserializer',
  `literal` string COMMENT 'from deserializer')
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
...

After it, I copied the definition and replaced 'avro.schema.url' with 'avro.schema.literal', but the table still doesn't work.

But when I delete some (random) fields, it works (e.g. with follwoing definition).

CREATE TABLE gaSession
     ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
     STORED AS
     INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
     OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
     TBLPROPERTIES ('avro.schema.literal'='{"type": "record",
"name": "root",
"fields": [
    {
        "name": "visitorId",
        "type": [
            "long",
            "null"
        ]
    },
    {
        "name": "visitNumber",
        "type": [
            "long",
            "null"
        ]
    },
    {
        "name": "visitId",
        "type": [
            "long",
            "null"
        ]
    },
    {
        "name": "visitStartTime",
        "type": [
            "long",
            "null"
        ]
    },
    {
        "name": "date",
        "type": [
            "string",
            "null"
        ]
    },
    {
        "name": "totals",
        "type": [
            {
                "type": "record",
                "name": "totals",
                "fields": [
                    {
                        "name": "visits",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "hits",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "pageviews",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "timeOnSite",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "bounces",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "transactions",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "transactionRevenue",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "newVisits",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "screenviews",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "uniqueScreenviews",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "timeOnScreen",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "totalTransactionRevenue",
                        "type": [
                            "long",
                            "null"
                        ]
                    }
                ]
            },
            "null"
        ]
    }
]
 }');

Has TBLPROPERTIES/avro.schema.literal has a max length or other limitations?

Hive-Version: 0.14.0


Solution

  • The Hortonworks support team confirmed, that there is 4000 character limit for tblproperties. So, by removing whitespaces you're able to define a larger table. Otherwise, you have to work with 'avro.schema.url'.