AWS Redshift cluster performance: ra3.4xlarge vs ra3.16xlarge

I want to figure out which Redshift cluster would perform better, or there would be no difference: 8 nodes of ra3.4xlarge or 2 nodes of ra3.16xlarge. In general - ra3.16xlarge is 4 times more powerful compared to ra3.4xlarge, it is also costs 4 times more. So on paper both options looks very similar. Is there any pros and cons to choose either configuration?

There are some pros which I could think of for each config (please correct me if I am not getting something right):

8x ra3.4xlarge - it is possible to upscale cluster by smaller increments based on workload.
2x ra3.16xlarge - more powerful leader node, which might benefit some workload.
2x ra3.16xlarge - when running queries where data must be re-distributed, it is only between 2 nodes, might be faster?
8x ra3.4xlarge -larger total storage capacity, but currently it is not concern as we are using less than 1% of available storage space

What would be the best option?

Solution

You have a good list of pros and cons. As these two node types are almost exactly scalings of each other the differences will come down to compute system differences which aren't well specified by AWS. So the way to really know is to do some testing.

A few additional things to consider:

Data redistribution performance, even a small one, can make a huge difference depending on your query workload. In many real-world cases a perfect DIST KEY architecture doesn't exist. There are just too many and varied query patterns and some will have large redistribution sizes. Moving data within a box is much faster than over an actual network. If your workload has performance critical queries that fit this pattern then fewer boxes will be important to you and this configuration will be faster.
Fewer boxes also means lower parallelism for many aspects of the computer system. Very rarely do computer systems scale up perfectly. While the specs for IO bandwidth may look to be perfectly scaled up it is likely that real-world IO bandwidth using mode nodes will be better. Again the importance of this will come down to your workload. On aggregate how disk intensive is your workload? Peak bandwidth for a single query is different than the aggregate bandwidth of many high-IO queries running in parallel.
You correctly point out the leader nodes importance in some situations. Is your workload likely to be stressing the leader? Lots of return data and cursors? Extremely complex queries to compile? Lots of data being inserted as VALUES? Extremely large amounts of COMMIT traffic per minute? If not then the performance of the leader is not likely to be a major factor but if it is then this will matter.

As you can see the differences between these nodes will only show up when your workload is not aligned with the "ideal". For a lot of workloads this choice won't make a meaningful difference and factors like "finer resolution for future cluster scaling" are key. In my experience many (most) real-world workloads on Redshift differ from the ideal in some ways (or maybe this is b/c I consult with clients having Redshift issues). Redshift is a cluster of computers and how you configure this cluster CAN very much matter.