Search code examples
python-3.xwildcardsnakemakefastq

Snakemake use different wildcards in a rule


I am trying to create snakemake rule that take in input my fastq files and return in output a .sam file for each fastq file.

I have a file like this:

FILE    TYPE    SM    LB    ID    PU          PL
xfgh.fastq.gz  Single      IND1  IND1  IND1  Platform    Illumina
IND2.fastq.gz     Single  IND2  IND2  IND2  Platform    Illumina
zfgv.fastq.gz  Single      IND3  IND3  IND3  Platform    Illumina 
IND4_P1.fastq.gz  Single      IND4  IND4  IND4  Platform    Illumina

So I did something like that.
I open my dataframe with pandas:

pd.read_csv("info_file.txt") and I stock in a list the columns file SM and ID

and i create my rule:

rule all:
    input:
        sam_file = expand("ALIGNEMENT/{sm}/{id}.sam", sm = info_df["SM"], id = info_df["ID"])

rule alignement:
    input:
          fastq_files = "PATH/TO/{fastq}"
    output:
          sam_file = "ALIGNEMENT/{sm}/{id}.sam"

I know input and output need to have the same wildcards but does there exist a method to have my input from the columns "FILES" of my file.txt and in output a path like that : "ALIGNEMENT/{sm}/{id}.sam" where {sm} and {id} are SM and ID columns of my file.txt

I also want to launch one rule per files.

If any one can help me thanks you


Solution

  • I am trying to create snakemake rule that take in input my fastq files and return in output a .sam file for each fastq file.

    From the above it seems to me that you want to add zip to the expand function in rule all. With zip you pair wildcards as they appear in your input lists, without it you get all combinations of {id} and {sm}.

    Then to get the input fastq file in rule alignment, you need to query the info dataframe to get the FILE corresponding to a given id. You can do this with a lambda function or write a dedicated function to use as input.

    Here's my take on it:

    import pandas as pd
    
    info_df = pd.read_csv("info_file.txt", sep='\t') 
    
    rule all:
        input:
            expand("ALIGNEMENT/{sm}/{id}.sam", zip, sm = info_df["SM"], id = info_df["ID"])
    
    rule alignement:
        input:
            fastq_files=lambda wc: info_df[info_df['ID'] == wc.id]['FILE'],
        output:
            sam_file = "ALIGNEMENT/{sm}/{id}.sam"
        shell:
            r"""
            echo {input.fastq_files} > {output.sam_file}
            """