Search code examples
amazon-s3hivetrinohive-metastoremetastore

How to connect HIVE Metastore + TRino + S3


Hive-Standalone-metastore = v3.1.3
Hadoop jars               = v3.3.4

I have setup Hive MetaStore with the eventual goal of connecting it with TRINO so I can query my parquet files in S3.. and I am in the trino CLI now and can see my hive.<schema_name> ... and now want to create a simple table so I can query.. but getting an exception

trino:<MY_SCHEMA>> CREATE TABLE IF NOT EXISTS hive.<MY_SCHEMA>.<MY_TABLE> (
              ->   column_one       VARCHAR,
              ->   column_two       VARCHAR,
              ->   column_three     VARCHAR,
              ->   column_four      DOUBLE,
              ->   column_five      VARCHAR,
              ->   column_six       VARCHAR,
              ->   query_start_time TIMESTAMP)
              -> WITH (
              ->   external_location = 's3a://<MY_S3_BUCKET_NAME>/dir_one/dir_two',
              ->   format = 'PARQUET'
              -> );
CREATE TABLE
Query 20220924_181001_00019_bvs42 failed: Got exception: java.io.FileNotFoundException PUT 0-byte object  on dir_one/dir_two: com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: IDNUM123; S3 Extended Request ID: soMeLongID123=; Proxy: null), S3 Extended Request ID: soMeLongID123:404 Not Found

I indeed tested my AWS credentials manually.. I can indeed connect to the bucket and read the bucket.. I have files of type parquet contained within the bucket...

what should I check or .. what could I be doing wrong? Thanks

EDIT: adding my hive.properties

connector.name=hive
hive.metastore.uri=thrift://$HIVE_IP_ADDR:9083
hive.s3.path-style-access=true
hive.s3.endpoint=$AWS_S3_ENDPOINT
hive.s3.aws-access-key=$AWS_ACCESS_ID
hive.s3.aws-secret-key=$AWS_SECRET
hive.s3.ssl.enabled=false

Solution

  • I ended up deleting the endpoint entry altogether and it started to come to life..

    Lesson is.. if you're following the tutorials for S3 integration .. most are NOT using S3 but some alternative like MinIO ... if you're using the real S3 you do NOT use the s3_endpoint at all..

    Do this instead

    connector.name=hive
    hive.metastore.uri=thrift://$HIVE_IP_ADDR:9083
    hive.s3.path-style-access=true
    hive.s3.aws-access-key=$AWS_ACCESS_ID
    hive.s3.aws-secret-key=$AWS_SECRET
    hive.s3.ssl.enabled=false