Existing process - Raw structure data are copied into a staging layer of Redshift. Then use ETL tools such as Informatica, Telend to do incremental loading into Fact and Dimension table of Datamart/datawarehouse. All joins happen within database layer(ETL pushes queries into DB)
Can Spark replace ETL tool and do the same processing and load data into Redshift?
What are the advantages and disadvantages of this architecture?
We use ETL to do these things:
1、To transfer the data to database;
2、To get the data from database and move it to other space.
3、To schedule the jobs when to run .
4、To check the jobs dependences
The Hadoop ecosphere that restore the data always use Relational database, so ETL will not be replacedas they do different things.