I want to allocate more vertexes to the extraction job, tried using ROWCOUNT hint, it doesn't seem to work, no matter what value I use for ROWCOUNT, U-SQL always allocate the same number of vertexes.
EXTRACT xxxx FROM @"Path" USING new RndsInDataLakeCode.PyramidExtractorMerged() OPTION(ROWCOUNT=50000000); Is there any other way to influence vertexes allocation
Thanks.
Basically the number of vertices used by EXTRACT are being determined by the following:
AtomicFileProcessing=true
(e.g., JSON, current Avro Extractor).AtomicFileProcessing=false
, e.g., Csv/Tsv extractors). The ROWCOUNT hint will only hint the resulting row count that will impact the subsequent partitioning.
Then the Analytics Units allocation mentioned by Omid will give you the actual degree of parallelism that is used to parallelize within the determined number of vertices (so overspecifying the Analytics Units will NOT make your code parallelize more).
Why do you want to increase the scale-out on the extraction?