Search code examples

Snakemake: define output from list of filenames in text file

Completely new to snakemake, so bear with me. I've spent a considerable amount of time to search for a similar question without any luck.

I want to create a rule to copy certain files to a new directory from a folder containing all the files.

The filenames for the files I want to copy are listed in a text file (one filename per line). I've written a small bash script using cat and xargs to copy the filenames listed in the text file to the target directory. This script works fine!

How do I tell snakemake that my output should be the target directory + filenames listed in the text file?

Ok, so my initial thought was to create a list inside the snakefile containing all target paths for the files that should be copied.

I made this hot garbage of a mess (also completely new to python):

import glob
all_files = glob.glob("path/to/all/files", recursive = False)

file = open("path/to/file/list_of_files.txt", "r")

file_lines = file.readlines()

path_to_target_dir = "some/path/"

path_lines = [path_to_target_dir + str(x) for x in file_lines]
# for some reason path_lines end with line break after each filename. not good.

# remove line break to yield correct paths + filenames
list_of_correct_paths = []

for element in path_lines:

This yields a list with all paths to where I want to copy the files.

rule cp_files_to_target_dir:
    cp_from = expand("path/to/all/files/{id}", id = all_files),
    list = "list_of_files.txt",
    script = ""
    cp_to = expand("{path}", path = list_of_correct_paths)

However, snakemake states that I'm missing input files for the rule.

I hope my question makes sense. I appreciate any help I can get.

EDIT: this works now

import os
# filenames for all files 
files = os.listdir("/path/to/all/files")

# create paths to files of interest from text file
text_file = open("path/to/text_file.txt", "r")
list_files =

target_path = "path/to/target/dir"

target_file_paths = [target_path + str(x) for x in list_files]

This yields a list with all paths to where I want to copy the files.

rule cp_files_to_target_dir:
    cp_from = expand("path/to/all/files/{id}", id = list_files),
    list = "text_file.txt",
    script = ""
    cp_to = expand("{path}", path = target_file_paths)


  • However, snakemake states that I'm missing input files for the rule.

    Probably variable cp_from in rule cp_files_to_target_dir does not contain the correct paths. To debug, I would suggest moving it outside the rule and print it to see what it contains. E.g.

    cp_from = expand("path/to/all/files/{id}", id = all_files),
    cp_to = expand("{path}", path = list_of_correct_paths)
    # To debug:
    print(cp_from) # Check these are what you expect
    rule cp_files_to_target_dir:
        cp_from = cp_from,
        list = "list_of_files.txt",
        script = ""
        cp_to = cp_to,

    In general, I think your script could be tidied up a bit but I cannot be more specific without more context.