Search code examples
rlinuxslurmrscript

load average boost up to 1000+ after adding a Rscript while submit my job using slurm


I submit my job using slurm, at the beginning, everything works well. After adding a Rscript to perform a simple filtering, the system load average suddenly boost up to 1000+, this is quite abnormal. I've tring to search through Google, but find noting. My code showed as followed:

#!/bin/bash

#SBATCH --job-name=gtool
#SBATCH --partition=Compute
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH -a 1-22

for file in output/impute2/data_chr"${SLURM_ARRAY_TASK_ID}".*impute2
do
  echo "$file" start!
  # file prefix
  foo=$(echo "$file" | awk -F "/" '{print $NF}' | awk -F . '{print $1"."$2}')
  # use R for subset ID
  Rscript src/detect.impute.snp.r "$file"
  # gtool subset
  gtool -S \
    --g "$file" \
    --s output/pre_phasing/chr"${SLURM_ARRAY_TASK_ID}".sample \
    --og output/impute2_subset/"$foo".gen \
    --inclusion output/impute2_subset/"$foo".SNPID.txt
  # gtool GEN to PED 
  gtool -G \
    --g output/impute2_subset/"$foo".gen \
    --s output/pre_phasing/chr"${SLURM_ARRAY_TASK_ID}".sample \
    --ped output/impute2_subset_2_PLINK/"$foo".impute2.ped \
    --map output/impute2_subset_2_PLINK/"$foo".impute2.map \
    --chr "${SLURM_ARRAY_TASK_ID}" \
    --snp
  echo "$file" fin!
done

Rscipt:

options(tidyverse.quiet = TRUE)
options(readr.show_col_types = FALSE) 
library("tidyverse")
args <- commandArgs(T)
fn <- args[1]
d <- read_delim(fn,
  col_names = F,
  delim = " ",
  col_select = c(2, 4, 5))

fn.out <- str_sub(last(str_split(fn,"/")[[1]]), 1, -9)
d %>% mutate(len1 = nchar(X4),
             len2 = nchar(X5)) %>%
  arrange(desc(X4), desc(X5)) %>% 
  filter(len1==1, len2 == 1) %>%
  select(X2) %>%
  write_tsv(file = str_c("output/impute2_subset/", fn.out,".SNPID.txt"),
            col_names = F)

scontrol also show that my job only use one CPU:

JobId=4873 ArrayJobId=4872 ArrayTaskId=1 JobName=gtool
   ......
   NodeList=localhost
   BatchHost=localhost
   NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   ......

R and gtool are using single thread and didn't provide a thread parameter, --ntasks also set to 1, where may the holes are?


Solution

  • Some libraries used by R and/or gtools like MKL, BLIS or OpenBLAS might be configured system-wise to use all cores of the node and not detect that Slurm only allocated one CPU. You can try to add

    export OMP_NUM_THREADS=1
    export BLIS_NUM_THREADS=1
    export MKL_NUM_THREADS=1
    export OPENBLAS_NUM_THREADS=1
    

    in your submission script just before the for loop..