Search code examples
linuxbashloopsif-statementiterator

How to iteratively run a program on multi-resolution files?


.mcool files contain matrices for multiple resolutions.

For every file in ./input/*.mcool, if cooler ls "$mcool_file" ends with 5000, 10000, or 50000 after the last /, I want to run predictSV from EagleC.

As shown on the repo, a single .mcool file

predictSV --hic-5k SKNAS-MboI-allReps-filtered.mcool::/resolutions/5000 \
          --hic-10k SKNAS-MboI-allReps-filtered.mcool::/resolutions/10000 \
          --hic-50k SKNAS-MboI-allReps-filtered.mcool::/resolutions/50000 \
          -O SK-N-AS -g hg38 --balance-type CNV --output-format full \
          --prob-cutoff-5k 0.8 --prob-cutoff-10k 0.8 --prob-cutoff-50k 0.99999

However, for a list of files, I need to write a for-loop to iteratively run predictSV.

Attempt 1:

for mcool_file in ./input/*.mcool; do
  while IFS= read -r id; do
    id_suffix=${id##*/}
    case $id_suffix in 5000|10000|50000)
      num=${id_suffix:0:1}
      predictSV \
        --hic-"${num}k" "$id_suffix" \
        -g hg38 \
        -O basename "${id%%.*}" \
        --balance-type CNV \
        --output-format full \
        --prob-cutoff-5k 0.8 \
        --prob-cutoff-10k 0.8 \
        --prob-cutoff-50k 0.99999
      ;;
    esac
  done < <(cooler ls "$mcool_file")
done

Traceback:

usage: predictSV [-h] [-v] [--hic-5k HIC_5K] [--hic-10k HIC_10K]
                 [--hic-50k HIC_50K] [-O OUTPUT_PREFIX]
                 [-g {hg38,hg19,chm13,other}] [-C [CHROMS ...]]
                 [--balance-type {ICE,CNV,Raw}]
                 [--output-format {full,NeoLoopFinder}]
                 [--prob-cutoff-5k PROB_CUTOFF_5K]
                 [--prob-cutoff-10k PROB_CUTOFF_10K]
                 [--prob-cutoff-50k PROB_CUTOFF_50K]
predictSV: error: unrecognized arguments:
usage: predictSV [-h] [-v] [--hic-5k HIC_5K] [--hic-10k HIC_10K]
                 [--hic-50k HIC_50K] [-O OUTPUT_PREFIX]
                 [-g {hg38,hg19,chm13,other}] [-C [CHROMS ...]]
                 [--balance-type {ICE,CNV,Raw}]
                 [--output-format {full,NeoLoopFinder}]
                 [--prob-cutoff-5k PROB_CUTOFF_5K]
                 [--prob-cutoff-10k PROB_CUTOFF_10K]
                 [--prob-cutoff-50k PROB_CUTOFF_50K]
predictSV: error: unrecognized arguments: --hic-1k 10000
usage: predictSV [-h] [-v] [--hic-5k HIC_5K] [--hic-10k HIC_10K]
                 [--hic-50k HIC_50K] [-O OUTPUT_PREFIX]
                 [-g {hg38,hg19,chm13,other}] [-C [CHROMS ...]]
                 [--balance-type {ICE,CNV,Raw}]
                 [--output-format {full,NeoLoopFinder}]
                 [--prob-cutoff-5k PROB_CUTOFF_5K]
                 [--prob-cutoff-10k PROB_CUTOFF_10K]
                 [--prob-cutoff-50k PROB_CUTOFF_50K]
predictSV: error: unrecognized arguments:
usage: predictSV [-h] [-v] [--hic-5k HIC_5K] [--hic-10k HIC_10K]
                 [--hic-50k HIC_50K] [-O OUTPUT_PREFIX]
                 [-g {hg38,hg19,chm13,other}] [-C [CHROMS ...]]
                 [--balance-type {ICE,CNV,Raw}]
                 [--output-format {full,NeoLoopFinder}]
                 [--prob-cutoff-5k PROB_CUTOFF_5K]
                 [--prob-cutoff-10k PROB_CUTOFF_10K]
                 [--prob-cutoff-50k PROB_CUTOFF_50K]
predictSV: error: unrecognized arguments:
usage: predictSV [-h] [-v] [--hic-5k HIC_5K] [--hic-10k HIC_10K]
                 [--hic-50k HIC_50K] [-O OUTPUT_PREFIX]
                 [-g {hg38,hg19,chm13,other}] [-C [CHROMS ...]]
                 [--balance-type {ICE,CNV,Raw}]
                 [--output-format {full,NeoLoopFinder}]
                 [--prob-cutoff-5k PROB_CUTOFF_5K]
                 [--prob-cutoff-10k PROB_CUTOFF_10K]
                 [--prob-cutoff-50k PROB_CUTOFF_50K]
..

Attempt 2:

for mcool_file in ./input/*.mcool; do
  while IFS= read -r id; do
    id_suffix=${id##*/}
    case $id_suffix in 5000|10000|50000)
      num=${id_suffix:0:1}
      predictSV \
        if [ $id_suffix=5000 ];
          then --hic-5k "$id_suffix";
        elif [ $id_suffix=10000 ];
          then --hic-10k "$id_suffix";
        else:
          --hic-50k "$id_suffix";
        fi \
        -g hg38 \
        -O basename "${id%%.*}" \
        --balance-type CNV \
        --output-format full \
        --prob-cutoff-5k 0.8 \
        --prob-cutoff-10k 0.8 \
        --prob-cutoff-50k 0.99999
      ;;
    esac
  done < <(cooler ls "$mcool_file")
done

Cooler for one .mcool file:

cooler ls ./../input/A001C007.hg38.nodups.pairs.mcool

./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/200
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/500
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/1000
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/2000
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/5000
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/10000
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/20000
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/50000
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/100000
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/250000
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/500000
./../input/A001C007.hg38.nodups.pairs.mcool::/resolutions/1000000
(EagleC) 

Solution

  • It would have been nice if, in the prior question, you had described what you were trying to do well enough to allow a correct answer.

    #!/usr/bin/env bash
    for mcool_file in input/*.mcool; do
    
      # iterate over ids emitted from cooler ls for this file
      hic5k_num=; hic10k_num=; hic50k_num=
      while IFS= read -r id; do
        id_suffix=${id##*/}
        case $id_suffix in
          5000)  hic5k_num=$id_suffix;;
          10000) hic10k_num=$id_suffix;;
          50000) hic50k_num=$id_suffix;;
        esac
      done < <(cooler ls "$mcool_file")
    
      predictSV \
       ${hic5k_num:+  --hic-5k "$hic5k_num"} \
       ${hic10k_num:+ --hic-10k "$hic10k_num"} \
       ${hic50k_num:+ --hic-50k "$hic50k_num"} \
       -g hg38 \
       -O "${mcool_file%%.*}" \
       --balance-type CNV \
       --output-format full \
       --prob-cutoff-5k 0.8 \
       --prob-cutoff-10k 0.8 \
       --prob-cutoff-50k 0.99999
    done