Search code examples
linuxdeep-learningpbsyolodarknet

CNN training exceeds number of given cores in PBS


I'm using CNN called darknet/YOLO for deep learning on remote shared cluster with NVIDIA graphic cards. Remote cluster is linux with PBS job planning system.

I'm submitting job to train neural network on GPU, which works well.

Problem is in huge amount of consumed processors during the training. I usually submit a job with 8 processors, like this

qsub -q gpu select=1:ncpus=8:ngpus=1:mem=15gb:gpu_cap=cuda61

but it's always killed because of exceeded number of processors. Even tho I increase number to 20, its still exceeded.

I don't know why darknet consumes so many processors on the server, even tho i may run the job on my notebook with Intel i5 processor (which is slow and inefficient).

What I've tried:

1) Set cgroups=cpuacct which forces the job to NOT to use more processors then assigned, but it didn't work at all. Seem's like restriction works just in case server dont have resources for others. In the case there are free processors, the restriction doesnt work (https://drill.apache.org/docs/configuring-cgroups-to-control-cpu-usage/#cpu-limits)

2) Set place=excelhost which does not kill the job in case it exceed assigned resources. On the other side, it takes like 7 days to even start the job with this flag and I have to train network every day.

Question:

I don't need these processors and i don't understand why the darknet uses so many of them. How may i force the job to NOT exceed the given number of processors ? Or some other idea how could i solve this kind of problem ?


Solution

  • Actually the reason why darknet neural net uses so many threads on shared cluster is that darknet does not even count with possibility it might be run on shared cluster.

    As you can see in source code of darknet -- src/detector.c, line 111 (Link), darknet uses 64 threads to prepare input for training and calculations. If you have not 64 cores, it will use as much as possible.

    To decrease number of threads, replace number of threads on the following lines. For me 8 threads are suitable.

    • detector.c 111, 393, 602
    • classifier.c 91

    Credits to Metacentrum support.