Search code examples
hadoophivedelete-row

How to delete rows in hive hadoop database


I'm a newbie with hadoop & hive. I want to delete certain rows in my database - which is on hive-hadoop. I know its not supported out of the box, and that hadoop is a read only file system. I'm curious about what are the best approaches for accomplishing this. If anyone has done this before, can they share their learnings/procedures?

Thanks!


Solution

  • The best approach is to partition your data such that the rows you want to drop are in a partition unto themselves. You can then drop the partition without impacting the rest of your table. This is a fairly sustainable model, even if your dataset grows quite large.