Balancer iteratively moves replicas from DataNodes with higher utilization to DataNodes with lower utilization.
Will that affect the concept of Rack awarness ?
For example I have three machines placed in two racks and data is placed by following the concept of rack awarness.
What would happen if I add a new machine to the cluster and run the balancer command?
Rack awareness & data locality is a YARN concept. The HDFS balancer only cares about leveling out the Datanode usage.
If you have 3 machines, with 3 replicas by default, then every machine could be guaranteed to have 1 replica, therefore with 2 racks, you're practically guaranteed to have rack locality.
Node locality is more performant than rack awareness, anyway.
If you have 10 GB intra cluster speeds between nodes, data locality is a moot point. This is why AWS can still reasonably process data in S3, for example, where data locality processing is not available