Describe the problem you faced How can we change the location of a hudi table to new location. I've customer table that is saved at s3://aws-amazon-com/Customer/ which I want to change to s3://aws-amazon-com/CustomerUpdated/ . I'm working on Glue 4
Using these jars: hudi-spark3-bundle_2.12-0.12.1.jar calcite-core-1.16.0.jar libfb303-0.9.3.jar
val partitionColumnName: String = "year"
val hudiTableName: String = "Customer"
val preCombineKey: String = "id"
val recordKey = "id"
val tablePath = "s3://aws-amazon-com/Customer/"
val databaseName="consumer_bureau"
val hudiCommonOptions: Map[String, String] = Map(
"hoodie.table.name" -> hudiTableName,
"hoodie.datasource.write.keygenerator.class" -> "org.apache.hudi.keygen.ComplexKeyGenerator",
"hoodie.datasource.write.precombine.field" -> preCombineKey,
"hoodie.datasource.write.recordkey.field" -> recordKey,
"hoodie.datasource.write.operation" -> "bulk_insert",
//"hoodie.datasource.write.operation" -> "upsert",
"hoodie.datasource.write.row.writer.enable" -> "true",
"hoodie.datasource.write.reconcile.schema" -> "true",
"hoodie.datasource.write.partitionpath.field" -> partitionColumnName,
"hoodie.datasource.write.hive_style_partitioning" -> "true",
// "hoodie.bulkinsert.shuffle.parallelism" -> "2000",
// "hoodie.upsert.shuffle.parallelism" -> "400",
"hoodie.datasource.hive_sync.enable" -> "true",
"hoodie.datasource.hive_sync.table" -> hudiTableName,
"hoodie.datasource.hive_sync.database" -> databaseName,
"hoodie.datasource.hive_sync.partition_fields" -> partitionColumnName,
"hoodie.datasource.hive_sync.partition_extractor_class" -> "org.apache.hudi.hive.MultiPartKeysValueExtractor",
"hoodie.datasource.hive_sync.use_jdbc" -> "false",
"hoodie.combine.before.upsert" -> "true",
"hoodie.index.type" -> "BLOOM",
"spark.hadoop.parquet.avro.write-old-list-structure" -> "false",
DataSourceWriteOptions.TABLE_TYPE.key() -> "COPY_ON_WRITE"
)
val df=Seq((1,"Mark",1990),(2,"Martin",2009)).toDF("id","name","year")
df.write.format("org.apache.hudi")
.options(hudiCommonOptions)
.mode(SaveMode.Append)
.save(tablelocation)
val tablelocationUpdated="s3://eec-aws-uk-ukidcibatchanalytics-prod-hudi-replication/consumer_bureau/production/CustomerUpdated/"
df.write.format("org.apache.hudi") //writng to new location
.options(hudiCommonOptions)
.mode(SaveMode.Append)
.save(tablelocationUpdated)
strong text
When I query Athena the table customer points to s3://aws-amazon-com/Customer/ not the updated location s3://aws-amazon-com/CustomerUpdated/ as expected . Is the table location change can be achieved using AWS glue or aws lambda.
Please help
spark.sql(s"""alter table customer set location 's3://aws-amazon-com/CustomerUpdated/ '""")
Will change the table location of the Hudi table.