Run the cluster steps for file upload on EMR

I have an EMR cluster with numbers of steps. I am trying to analyze log data coming in every week. I want to run the same steps every week on appended data.

Long-running cluster:

Load Log file from data source (load or copy records from log file if it is subsequent run)
Analyze data
Return data to the destination

How can I run the same steps every week on the cluster?

Or do I need to spin up new cluster every week?

It would be great if I could get some guidance on type of data source in such a scenario which handle huge data.

Solution

You can submit new steps to a cluster by calling add-steps — AWS CLI Command Reference.

Thus, you would need a cron job somewhere that calls the cluster to add the steps. You could create the cron job on the Master node, or there are a myriad of Hadoop tools that can schedule and orchestrate jobs.

You certainly do not require a new node since you have a cluster already operating.

Cannot Connect To AWS Elasticache Redis Cluster From Local Machine
How do i access AWS SAM-CLI through bash on windows?
Reporting AWS Tools RDS or Redshift?
What does "eksctl create iamserviceaccount" do under the hood on an EKS cluster?
Configure Selenium Nodes without a JSON file
How to bypass expectation of S3 server 100-contiune response in Boto3 put_object method
Deleting Perforce depot from AWS EC2 server does not free up space on EC2
Middy is not getting a secret from Secret Manager in a NodeJS AWS Lambda
Can I include a display name when sending email from the AWS SES Javascript v3 SDK?
Is it possible to add fields to struct in an existing AWS Athena table?
AWS SNS - Message missing a phone number
How to run a one-off task on AWS ECS Fargate?
AWS FIFO queues subscription with SNS: passing message group id
Is it necessary to add health-check config for ecs.CfnTaskDefinition.ContainerDefinitionProperty?
AWS CLI to get all matching CloudFronts using the "--query" option?
What is the best way to read a csv and text file from S3 on AWS glue without having to read it as a Dynamic daataframe?
Why is CloudFormation saying AlreadyExists when creating a AWS::ApiGateway::Authorizer
AWS sts assume role in one command
GetSecretValue, get identity: get credentials: failed to refresh cached credentials
Does PutACLAsync make a copy of an object?
How to use AWS CodeArtifact *within* A Dockerfile in AWSCodeBuild
AWS CodeBuild: Accessing CodeCommit repository in another account?
AWS Java SDK SSL Certificates
How can I use AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to perform actions in AWS through a Jenkins pipeline?
Syntax error when executing Invoke-DDBQuery to fetch dynamodb record using Powershell
How can I delete a specific record from my AWS Glue table?
Start cfn-init in Ubuntu instance with cloudformation (yaml)
How to check ephemeral storage that is allocated to EKS node
Permission Error Running Container in AWS CodeBuild
retrieve list of urls from Amazon S3 bucket using R