Completely new to snakemake, so bear with me. I've spent a considerable amount of time to search for a similar question without any luck.
I want to create a rule to copy certain files to a new directory from a folder containing all the files.
The filenames for the files I want to copy are listed in a text file (one filename per line).
I've written a small bash script using cat
and xargs
to copy the filenames listed in the text file to the target directory. This script works fine!
How do I tell snakemake that my output should be the target directory + filenames listed in the text file?
Ok, so my initial thought was to create a list inside the snakefile containing all target paths for the files that should be copied.
I made this hot garbage of a mess (also completely new to python):
import glob
all_files = glob.glob("path/to/all/files", recursive = False)
file = open("path/to/file/list_of_files.txt", "r")
file_lines = file.readlines()
path_to_target_dir = "some/path/"
path_lines = [path_to_target_dir + str(x) for x in file_lines]
# for some reason path_lines end with line break after each filename. not good.
# remove line break to yield correct paths + filenames
list_of_correct_paths = []
for element in path_lines:
list_of_correct_paths.append(element.strip())
This yields a list with all paths to where I want to copy the files.
rule cp_files_to_target_dir:
input:
cp_from = expand("path/to/all/files/{id}", id = all_files),
list = "list_of_files.txt",
script = "bash_script.sh"
output:
cp_to = expand("{path}", path = list_of_correct_paths)
shell:
"{input.script}"
However, snakemake states that I'm missing input files for the rule.
I hope my question makes sense. I appreciate any help I can get.
EDIT: this works now
import os
# filenames for all files
files = os.listdir("/path/to/all/files")
# create paths to files of interest from text file
text_file = open("path/to/text_file.txt", "r")
list_files = text_file.read().splitlines()
target_path = "path/to/target/dir"
target_file_paths = [target_path + str(x) for x in list_files]
This yields a list with all paths to where I want to copy the files.
rule cp_files_to_target_dir:
input:
cp_from = expand("path/to/all/files/{id}", id = list_files),
list = "text_file.txt",
script = "bash_script.sh"
output:
cp_to = expand("{path}", path = target_file_paths)
shell:
"{input.script}"
However, snakemake states that I'm missing input files for the rule.
Probably variable cp_from
in rule cp_files_to_target_dir
does not contain the correct paths. To debug, I would suggest moving it outside the rule and print it to see what it contains. E.g.
cp_from = expand("path/to/all/files/{id}", id = all_files),
cp_to = expand("{path}", path = list_of_correct_paths)
# To debug:
print(cp_from) # Check these are what you expect
print(cp_to)
rule cp_files_to_target_dir:
input:
cp_from = cp_from,
list = "list_of_files.txt",
script = "bash_script.sh"
output:
cp_to = cp_to,
shell:
"{input.script}"
In general, I think your script could be tidied up a bit but I cannot be more specific without more context.