Search code examples
snakemake

snakemake: how to implement log directive when using run directive?


Snakemake allows creation of a log for each rule with log parameter that specifies the name of the log file. It is relatively straightforward to pipe results from shell output to this log, but I am not able to figure out a way of logging output of run output (i.e. python script).

One workaround is to save the python code in a script and then run it from the shell, but I wonder if there is another way?


Solution

  • I have some rules that use both the log and run directives. In the run directive, I "manually" open and write the log file.

    For instance:

    rule compute_RPM:
        input:
            counts_table = source_small_RNA_counts,
            summary_table = rules.gather_read_counts_summaries.output.summary_table,
            tags_table = rules.associate_small_type.output.tags_table,
        output:
            RPM_table = OPJ(
                annot_counts_dir,
                "all_{mapped_type}_on_%s" % genome, "{small_type}_RPM.txt"),
        log:
            log = OPJ(log_dir, "compute_RPM_{mapped_type}", "{small_type}.log"),
        benchmark:
            OPJ(log_dir, "compute_RPM_{mapped_type}", "{small_type}_benchmark.txt"),
        run:
            with open(log.log, "w") as logfile:
                logfile.write(f"Reading column counts from {input.counts_table}\n")
                counts_data = pd.read_table(
                    input.counts_table,
                    index_col="gene")
                logfile.write(f"Reading number of non-structural mappers from {input.summary_table}\n")
                norm = pd.read_table(input.summary_table, index_col=0).loc["non_structural"]
                logfile.write(str(norm))
                logfile.write("Computing counts per million non-structural mappers\n")
                RPM = 1000000 * counts_data / norm
                add_tags_column(RPM, input.tags_table, "small_type").to_csv(output.RPM_table, sep="\t")
    

    For third-party code that writes to stdout, maybe the redirect_stdout context manager could be helpful (found in https://stackoverflow.com/a/40417352/1878788, documented at https://docs.python.org/3/library/contextlib.html#contextlib.redirect_stdout).

    Test snakefile, test_run_log.snakefile:

    from contextlib import redirect_stdout
    
    rule all:
        input:
            "test_run_log.txt"
    
    rule test_run_log:
        output:
            "test_run_log.txt"
        log:
            "test_run_log.log"
        run:
            with open(log[0], "w") as log_file:
                with redirect_stdout(log_file):
                    print(f"Writing result to {output[0]}")
                    with open(output[0], "w") as out_file:
                        out_file.write("result\n")
    

    Running it:

    $ snakemake -s test_run_log.snakefile
    

    Results:

    $ cat test_run_log.log 
    Writing result to test_run_log.txt
    $ cat test_run_log.txt 
    result