I read the Greenplum architecture here https://gpdb.docs.pivotal.io/530/admin_guide/intro/arch_overview.html This looks like one master node vs so many segment nodes?
Question 1: Is master node not a bottleneck as it is just one doing all the work for so many segments?
Question 2: Is it fair to compare the segment's work like work done by mapper (mapper as in MapReduce) and Master Node's work as reducer? If yes - then how does it handle this disproportion of number of instances ?
A1. No, the master is mostly idle. It handles client connections, generating query plans, monitoring the nodes for availability, and returning results back to the clients.
A2. No. The master is more similar to the NameNode but it does even less than that. The NameNode keeps track of the block locations where the Greenplum master doesn't.