amazon-web-services apache-spark hadoop amazon-s3 hdfs

Spark - can "spark.deploy.spreadOut = false" give performance benefit on S3

i understand "spark.deploy.spreadOut" when set to true can benefit HDFS, but for S3 can setting to false have a benefit over true?

Solution

If you're running Hadoop and HDFS, it would not benefit you to use Spark Standalone scheduler for which that property applies. Rather, you should be running YARN, and the ResourceManager determines how executors are spread

If you are running Standalone scheduler in EC2, then setting that property will help, and the default is true.

In other words, where you're reading the data from is not the deciding factor here, the deploy mode for the master is

The better performance benefits would come from the number of files you're trying to read, and which formats you store the data in

AWS RDS restoring snapshot with upgraded engine version giving error
Error with not existing instance profile while trying to get a django project running on AWS Beanstalk
Timeout when trying to retrieve EC2 instance-id metadata from within it
Lease table is not updated when new shards are added causing stale workers in KCL
libsndfile.so.1: cannot open shared object file: No such file or directory
Why does ECS binpack strategy on memory always scale up a new EC2 instance despite available resources?
How to pull every single image (tag or untagged) from AWS ECR?
In CloudFormation how can I link RDS and Secrets Manager so I don't have to hardcode the password?
How to install postgresql-client to Amazon EC2 Linux machine?
PyCharm: Why "Python Console" is not accessing ~\.aws\credentials file? How to set it within "Python Console"
Is it possible to run aws s3 sync with boto3?
Python 3 asyncio with aioboto3 seems sequential
Serverless / AWS Lambda - Create the triggers for the published lambda versions
AWS Glue missing permissions
Invoking lambda from API gateway test, but hitting the endpoint does not invoke the lambda. 500 returned
terraform destroy needs original terraform code to destroy?
How to correctly/safely access parameters from AWS SSM Parameter store for my Python script on EC2 instance?
Why is my image not displaying on my Web page?
How to copy blobs from azure to aws using python?
S3 Bucket action doesn't apply to any resources
Couldn't proceed with upgrade process as new nodes are not joining node group standard-workers
How to customize "From" name for SNS emails
Changing the name of a Load Balancer on AWS Console
AWS S3 Bucket Access Issue with IAM User Permissions
how to decrease the exceution time of aws lambda function nodejs
Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect)
Downloading an entire S3 bucket?
Unable to unzip .zip file on linux machine
Missing Authentication Token while accessing API Gateway?
AWS S3 Access Denied when open object URL in browser