I have a snakemake pipeline where I need to do a small step of processing the data (applying a rolling average to a dataframe).
I would like to write something like this:
rule average_df:
input:
# script = ,
df_raw = "{sample}_raw.csv"
params:
window = 83
output:
df_avg = "{sample}_avg.csv"
shell:
"""
python
import pandas as pd
df=pd.read_csv("{input.df_raw}")
df=df.rolling(window={params.window}, center=True, min_periods=1).mean()
df.to_csv("{output.df_avg}")
"""
However it does not work.
Do I have to create a python file with those 4 lines of code? The alternative that occurs to me is a bit cumbersome. It would be
average_df.py
import pandas as pd
def average_df(i_path, o_path, window):
df=pd.read_csv(path)
df=df.rolling(window=window, center=True, min_periods=1).mean()
df.to_csv(o_path)
return None
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(description='Description of your program')
parser.add_argument('-i_path', '--input_path', help='csv file', required=True)
parser.add_argument('-o_path', '--output_path', help='csv file ', required=True)
parser.add_argument('-w', '--window', help='window for averaging', required=True)
args = vars(parser.parse_args())
i_path = args['input_path']
o_path = args['output_path']
window = args['window']
average_df(i_path, o_path, window)
And then have the snakemake rule like this:
rule average_df:
input:
script = average_df.py,
df_raw = "{sample}_raw.csv"
params:
window = 83
output:
df_avg = "{sample}_avg.csv"
shell:
"""
python average_df.py --input_path {input.df_raw} --ouput_path {output.df_avg} -window {params.window}
"""
Is there a smarter or more efficient way to do this? That would be great! Looking forward to your input!
This can be achieved via run
directive:
rule average_df:
input:
# script = ,
df_raw = "{sample}_raw.csv"
params:
window = 83
output:
df_avg = "{sample}_avg.csv"
run:
import pandas as pd
df=pd.read_csv(input.df_raw)
df=df.rolling(window=params.window, center=True, min_periods=1).mean()
df.to_csv(output.df_avg)
Note that all snakemake
objects are available directly via input
, output
, params
, etc.