Search code examples
apache-sparkaws-glueapache-hudi

Change the location of a Hudi table in AWS?


Describe the problem you faced How can we change the location of a hudi table to new location. I've customer table that is saved at s3://aws-amazon-com/Customer/ which I want to change to s3://aws-amazon-com/CustomerUpdated/ . I'm working on Glue 4

Using these jars: hudi-spark3-bundle_2.12-0.12.1.jar calcite-core-1.16.0.jar libfb303-0.9.3.jar

val partitionColumnName: String = "year"
val hudiTableName: String = "Customer"
val preCombineKey: String = "id"
val recordKey = "id"
val tablePath = "s3://aws-amazon-com/Customer/"
val databaseName="consumer_bureau"






val hudiCommonOptions: Map[String, String] = Map(
    "hoodie.table.name" -> hudiTableName,
    "hoodie.datasource.write.keygenerator.class" -> "org.apache.hudi.keygen.ComplexKeyGenerator",
    "hoodie.datasource.write.precombine.field" -> preCombineKey,
    "hoodie.datasource.write.recordkey.field" -> recordKey,
    "hoodie.datasource.write.operation" -> "bulk_insert",
    //"hoodie.datasource.write.operation" -> "upsert",
    "hoodie.datasource.write.row.writer.enable" -> "true",
    "hoodie.datasource.write.reconcile.schema" -> "true",
    "hoodie.datasource.write.partitionpath.field" -> partitionColumnName,
    "hoodie.datasource.write.hive_style_partitioning" -> "true",
    // "hoodie.bulkinsert.shuffle.parallelism" -> "2000",
    //  "hoodie.upsert.shuffle.parallelism" -> "400",
    "hoodie.datasource.hive_sync.enable" -> "true",
    "hoodie.datasource.hive_sync.table" -> hudiTableName,
    "hoodie.datasource.hive_sync.database" -> databaseName,
    "hoodie.datasource.hive_sync.partition_fields" -> partitionColumnName,
    "hoodie.datasource.hive_sync.partition_extractor_class" -> "org.apache.hudi.hive.MultiPartKeysValueExtractor",
    "hoodie.datasource.hive_sync.use_jdbc" -> "false",
    "hoodie.combine.before.upsert" -> "true",
    "hoodie.index.type" -> "BLOOM",
    "spark.hadoop.parquet.avro.write-old-list-structure" -> "false",
    DataSourceWriteOptions.TABLE_TYPE.key() -> "COPY_ON_WRITE"
  )
  
  
  val df=Seq((1,"Mark",1990),(2,"Martin",2009)).toDF("id","name","year")
  
  
     df.write.format("org.apache.hudi")
    .options(hudiCommonOptions)
    .mode(SaveMode.Append)
    .save(tablelocation)
    
    val tablelocationUpdated="s3://eec-aws-uk-ukidcibatchanalytics-prod-hudi-replication/consumer_bureau/production/CustomerUpdated/"
   


    df.write.format("org.apache.hudi") //writng to new location
    .options(hudiCommonOptions)
    .mode(SaveMode.Append)
    .save(tablelocationUpdated)

strong text

When I query Athena the table customer points to s3://aws-amazon-com/Customer/ not the updated location s3://aws-amazon-com/CustomerUpdated/ as expected . Is the table location change can be achieved using AWS glue or aws lambda.

Please help


Solution

  • spark.sql(s"""alter table customer set location  's3://aws-amazon-com/CustomerUpdated/ '""")
    

    Will change the table location of the Hudi table.