I would like to :
cram-2
) using a separate lookup table linking id
values to a Family
value (e.g., Lookup Table below)Example:
The HOX.bed file rows :
ma reg out fim id=HOX;seq=AGCAGGAAATA;score=12.1915;pval=4.97e-05
se reg out fim id=HOX;seq=AGCAGGAAATA;score=12.1915;pval=4.97e-05
to reg out fim id=HOX;seq=AGCAGGAAATA;score=12.1915;pval=4.97e-05
pa reg out fim id=HOX;seq=AGCAGGAAATA;score=12.1915;pval=4.97e-05
The zinc.bed file rows :
ma reg out fim id=zinc;seq=AGCAGGAAATA;score=12.1915;pval=4.97e-05
se reg out fim id=zinc;seq=AGCAGGAAATA;score=12.1915;pval=4.97e-05
to reg out fim id=zinc;seq=AGCAGGAAATA;score=12.1915;pval=4.97e-05
pa reg out fim id=zinc;seq=AGCAGGAAATA;score=12.1915;pval=4.97e-05
The lookup table :
Name Family
HOX cram-2
zinc cram-2
fire sf.xr
fire ra.XS-2
...continues...
the output I search to obtain :
File name = cram-2.bed
Concatenate HOX.bed and zinc.bed because both are from Family cram-2!
ma reg out fim id=HOX;seq=AGCAGGAAATA;score=12.1915;pval=4.97e-05
se reg out fim id=HOX;seq=AGCAGGAAATA;score=12.1915;pval=4.97e-05
to reg out fim id=HOX;seq=AGCAGGAAATA;score=12.1915;pval=4.97e-05
pa reg out fim id=HOX;seq=AGCAGGAAATA;score=12.1915;pval=4.97e-05
ma reg out fim id=zinc;seq=AGCAGGAAATA;score=12.1915;pval=4.97e-05
se reg out fim id=zinc;seq=AGCAGGAAATA;score=12.1915;pval=4.97e-05
to reg out fim id=zinc;seq=AGCAGGAAATA;score=12.1915;pval=4.97e-05
pa reg out fim id=zinc;seq=AGCAGGAAATA;score=12.1915;pval=4.97e-05
I started to prepare a script structure but I am struggling in how to set up that all the files with same Family will have to end up in the same output file (.bed possibly)
myFiles <- list.files(pattern = "\\.bed$")
for(i in myFiles){
name <- read.table((i), header = FALSE, sep="\t", stringsAsFactors=FALSE, quote="")
name <- name %>% top_n(1, "id")
Family_filtering <-
table %>% filter(
Family %in% name)
save(...????????...)
}
Thank you a lot for the help!!!
Convert each activity into one function and then combine it all together. Simple isn't it?!?
library(fs)
library(tidyverse)
dfNameFamily = tibble(
Name = c("HOX", "zinc", "fire", "fire2"),
Family = c("cram-2", "cram-2", "sf.xr", "ra.XS-2"))
dir = "bedfile"
BedFile = function(dir) dir_ls(dir, regexp = "\\.bed$")
readTxt = function(FileName){
lines = character()
if(file_exists(FileName)){
con = file(FileName, open = "r")
lines = readLines(con)
close(con)
}
lines
}
GetName = function(l) str_match(l, "id=(.+);seq")[1,2]
SaveFile = function(l, name, dir){
con = file(paste0(dir, "/" , name))
writeLines(unlist(l$lines), con)
close(con)
}
tibble(FileName = BedFile(dir)) %>% #Read all bed file names
mutate(
lines = map(FileName, readTxt), #Read all lines from any bed file
Name = map_chr(lines, GetName)) %>% #Get Name for eny bed file
left_join(dfNameFamily, by="Name") %>% #Join Family
group_by(Family) %>%
group_walk(SaveFile, dir) #Save Family file