hadoop mapreduce cassandra hive datastax-enterprise

cassandra.input.split.size is not reflecting in DSE3.2.4 Hadoop

I am processing Cassandra tables using Hive in DSE3.2.4. Irrespective of the table size it is running 513 mappers for each job. I tried to change

cassandra.input.split.size 65536
mapred.min.split.size 1000000

these are reflecting in Job.xml but no luck,

tring changing mapred.map.tasks to 4 is not reflecting in Job.xml, I know this won't reflect but just gave a try

I still don't understand why this fancy Number 513?

Solution

513 = 256 vnodes splits * 2 + 1

This makes me guess you have a 2 node cluster. The number of splits is dependent on two things. The number of token ranges in the cluster and the number of partitions in those ranges. Currently every vnodes range is made into at least one split which is why vnodes are not recommended for use with analytics clusters.