Search code examples
hiveorcbloom-filterhiveddl

Is it possible to add a bloom filter on an existing table with data?


I have a table stored in ORC format with a bloom filter defined for 1 column. Is it possible to add a filter for another column (without reinserting the data) after the table is created and populated with data ?


Solution

  • No. it is not possible without rewriting the data. Alter table will not update files, and indexes and bloom filters are being stored in the data files, not in the metastore. If you alter table without rewriting data, then filters will be created for going forward basis, for newly inserted/updated data. So, you need to reinsert the data and much better to sort by filter columns, so bloom filters will be more efficient. Read about ORC indexes here.