Why does a hive table need to be bucketed to support ACID transactions?

I'd like to know why a hive table needs to be bucketed to support ACID transactions. Is it just some hive quirk? Or was there a reason behind it?

Solution

Here's something about hive's compactor:

The compactor runs background MapReduce jobs to compact the delta and base files. There are two types of compaction: major and minor. The minor compaction merges many small delta files into one big delta file. The major compaction is more expensive, it takes delta files and merges them with the base files. All merging happens by creating a new file and removing the old ones. There is a special cleaning process to do so. The compaction is done for each bucket separately. Base and Delta files are created per bucket.

More here: https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions

So, the more buckets, the faster the compaction.

Hql query through using YARN APPLICATION ID
How to delete data physically with Presto/Trino?
Parsing a table that has a JSON column with multiple rows per field
How do we know if a table is managed table or external table?
I need to calculate profit/loss for given stock data set, ensuring that the first bought items are sold first
Filtering is supported only on partition keys of type string Hive
Flutter, Hive, GetIt => Problem Registering Box
Export as csv in beeline hive
DuckDB insert the hive partitions into parquet file
hive -e with delimiter
Spark: what options can be passed with DataFrame.saveAsTable or DataFrameWriter.options?
Simulate recursive formula
Will Insert Into Command preserve order in Spark3
Spark Streaming - Refresh Static Data
HIVE SQL script to set end date to 12-31 of current year
Hive lateral view explode with 2 table joins
Flutter Error. Unhandled Exception: HiveError: Cannot write, unknown type: Settings. Did you forget to register an adapter?
Start token not found error while using JsonSerDe
convert TO_CHAR, IS_DATE to hive query
Docker - Hive with Postgres errors
Can we able to use mulitple sparksessions to access two different Hive servers
Docker Hive - /entrypoint.sh: line 4: pg_isready: command not found
How to connect a remote Hive server from using JDBC and SSL?
Explode hive table with null arrays in the column
Gini coefficient in hive by group
How to include a shell variable in the file name of a csv from hive
Rank in SQL special case
Pros and Cons of Storing Data in Flutter Hive as Object vs String
Alter hive table add or drop column
Apache Tez tasks on hold at the Application Master