In my company, we have Pentaho Kettle doing our ETL jobs daily (one machine is sufficient for all the data we have), which means:
We are planning on redoing it, in a more advanced tool, which would enable us:
Something that occurs to my mind are Luigi or Airflow as workflow managers, and doing the code based ETL using python? Since our entire infrastructure is on the AWS cloud, I see that AWS Glue is also appearing now as an option (i do not know if it is only for etl, or also can be used for other processes that we plan to include)
Is there any other solution? Does anyone have experience using these (especially how they work with redshift, s3, possibly triggering in the future spark/kinesis workloads)?
If yes, what are the libraries to use, and a good place to start and learn?
Sorry, but it is almost impossible to answer questions like these. Each company and team are different. What works for us won't necessarily work for you.
I can, however, provide some general advice:
Play to your strengths. If your team is full of strong C# coders don't choose python. If you know SQL Server inside and out and choose their ETL tool.
Planning. This is the most important step. Make sure you have fully analysed and documented how the new ETL solution will work. Identifying and solving all the complex problems up front will lead to shorter development times and a tidier solution. A greater understanding of the nitty-gritty will also help you to evaluate the various tools and frameworks on offer. At the end of this process you should know:
If you plan properly it won't matter what technologies you use.
Prototype and test. Especially important if you are using a tool or framework for the first time. At the very least test major functionality before committing to an approach. A company I once worked for spent tens of thousands of pounds on an ETL solution. The day after it was installed we discovered it did not support our CRM tool. No workaround could be found and we were forced to buy a second ETL tool. Very expensive mistake.
Don't aim at a moving target. Ideally, the new and old ETL system should use exactly the same source data and populate exactly the same reporting tables. This dramatically simplifies testing. This allows you to dual run. This allows you to fall back to the old solution should you need to. Save the new fancy stuff for release two.
Code. Don't write any code until all the other steps have been completed (prototypes/tests aside). When you fully understand a problem the code (almost) writes itself.
For context; I manage a 3 billion record data warehouse for a large multi-national. I have made every single mistake I am warning you against.