I am designing an application which should read a txt file from S3 every 15 min, parse the data separated by | and load this data into aerospike cluster in 3 different aws regions. The file size can range from 0-32 GB and the no of records it may contain is between 5-130 million.
I am planning to deploy a custom Java process in every aws region which will download a file from S3 and loads into aerospike using multiple threads.
I just came across aws glue. Can anybody tell me if I can use aws glue to load this big chunk of data into aerospike? or any other recommendation to set up an efficient and performant application?
Thanks in advance!
AWS Glue does an extract, transform then loads into RedShift, EMR or Athena. You should take a look at AWS Data Pipeline instead, using the ShellCommandActivity to run your s3 data through extraction and transformation and writing the transformed data to Aerospike.