I have a Hive External table X in HDFS. Files from RDBMS will keep coming to the folder location of the table X.
Last week there was a new column added to RDBMS, and the files came into the external table with the new column's data.
I know that i should add a new column to the Hive external table in this case.
But, How do i prevent files with new columns coming to my External folder.??
or atleast how do i recognize that the new column is coming?
Either source team need to communicate to you about the changes they are making. If your enterprise have change control and a review board, you need to be part of change control review for the applications you are getting data from.
If change control is not possible, as part of your data integration process you need to check the database tables for any changes. If there are changes then the process should notify about the changes.
If you need not worry about new columns, you have to get data from source using "select from ", this will not fetch the data for new columns.
Finally, it depends up on how you want to tackle it. There is no out of the box solution, it is typical data integration problem for which you need to have custom solution as per your organization practices.