Search code examples
tensorflowbazelremote-executionbazel-rules

Bazel-buildfarm - specifying concurrency of worker


I am trying to build TensorFlow using bazel-buildfarm. I have a server and a single worker setup using the example configurations available at https://github.com/bazelbuild/bazel-buildfarm (see examples/ directory). The lone worker is on a 72-core machine.

The problem I'm having is that once I kick off a build, although the build targets are being successfully dispatched to the worker, the worker is not taking advantage of all my cores (not even close). I tried explicitly setting --jobs=100 on the client when I initiate the TensorFlow build, but to no avail.

Does anyone have an idea how I can get my single worker to fully utilize the processing power available to it? Does this need to be specified explicitly in a worker configuration file?


Solution

  • The worker configuration file has a setting called execute_stage_width which can be used to specify degree of concurrency.

    https://github.com/bazelbuild/bazel-buildfarm/blob/e5c8db954f98644036172f790b877513d682ac79/examples/worker.config.example#L109-L110