I need to move data from on-premise to AWS redshift(region1). what is the fastest way?
1) use AWS snowball to move on-premise to s3 (region1)and then use Redshift's SQL COPY cmd to copy data from s3 to redshift.
2) use AWS Datapipeline(note there is no AWS Datapipeline in region1 yet. so I will setup a Datapipeline in region2 which is closest to region1) to move on-premise data to s3 (region1) and another AWS DataPipeline (region2) to copy data from s3 (region1) to redshift (region1) using the AWS provided template (this template uses RedshiftCopyActivity to copy data from s3 to redshift)?
which of above solution is faster? or is there other solution? Besides, will RedshiftCopyActivity faster than running redshift's COPY cmd directly?
Note it is one time movement so I do not need AWS datapipeline's schedule function.
Here is AWS Datapipeline's link: AWS Data Pipeline. It said: AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services, as well as on-premises data sources....
It comes down to network bandwidth versus the quantity of data.
The data needs to move from the current on-premises location to Amazon S3.
This can either be done via:
You can use an online network calculator to calculate how long it would take to copy via your network connection.
Then, compare that to using AWS Snowball to copy the data.
Pick whichever one is cheaper/easier/faster.
Once the data is in Amazon S3, use the Amazon Redshift COPY
command to load it.
If data is being continually added, you'll need to find a way to send continuous updates to Redshift. This might be easier via network copy.
There is no benefit in using Data Pipeline.