Search code examples
pythonshellintegersnakemake

Snakemake: How do I get a shell command running with different arguments (integer) in a rule?


I'm trying to research the best hyperparameters for my boosted decision tree training. Here's the code for just two instances:

user = '/home/.../BDT/'

nestimators = [1, 2]

rule all:
        input: user + 'AUC_score.pdf'

rule testing:
        output: user + 'AUC_score.csv'
        shell: 'python bdt.py --nestimators {}'.format(nestimators[i] for i in range(2))

rule plotting:
        input: user + 'AUC_score.csv'
        output: user + 'AUC_score.pdf'
        shell: 'python opti.py

The plan is as follows: I want to parallelize the training of my BDT with a bunch of different hyperparameters (for the beginning I just want to start with nestimators). Therefore I try to use the shellcommand to train the bdt. bdt.py gets the argument for training, trains and saves the hyperparameters + training score in a csv file. In the csv file I can look which hyperparameters give the best scores. Yej!

Sadly it doesn't work like that. I tried to use the input function but since I want to give an integer it does not work. I tried it the way you can see above but know I get an 'error message' : 'python bdt.py --nestimators <generator object at 0x7f5981a9d150>'. I understand why this doesn't work either but I don't know where to go from here.


Solution

  • The error arises because {} is replaced by a generator object, that is, it is not replaced first by 1 and then by 2 but, so to speak, by an iterator over nestimators.

    Even if you correct the python expression in the rule testing. There may be a more fundamental problem if I understand your aim correctly. The workflows of snakemake are defined in terms of rules that define how to create output files from input files. Therefore, the function testing will be called only once, but probably you want to call the rule separately for each hyperparameter.

    The solution will be to add the hyperparameter in the filename of the output. Something like this:

    user = '/home/.../BDT/'
    
    nestimators = [1, 2]
    
    rule all:
            input: user + 'AUC_score.pdf'
    
    rule testing:
            output: user + 'AUC_score_{hyper}.csv'
            shell: 'python bdt.py --nestimators {wildcards.hyper}'
    
    rule plotting:
            input: expand(user + 'AUC_score_{hyper}.csv', hyper=nestimators)
            output: user + 'AUC_score.pdf'
            shell: 'python opti.py'
    

    Finally, instead of using shell: to call a python script. You can directly used script: as explained in the documentation: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#external-scripts