I want to add double quotes around every second word in this single string.
From this
gene_id ENSG00000081237; gene_version 20; transcript_id ENST00000442510; transcript_version 8;
gene_type protein_coding; gene_name CD45A;
to this
gene_id "ENSG00000081237"; gene_version "20"; transcript_id "ENST00000442510"; transcript_version "8";
gene_type "protein_coding"; gene_name "CD45A";
I have been looking through tidyverse and stringr but have not yet found good way to do this.
Thanks!
Here is a base R approach.
First remove the ;
at the end of the string, then split the vector of gene information by ;
, then split again by empty space " " and save to a new vector vec_apply
.
After that, paste back the unmodified split strings together with the modified strings (the strings that have new double quotes).
Note that in the console, double quotes will be preceded with backslash \
to "escape" the double quote. But after you have saved the vector to a text file, the backslash will be gone.
vec <- c("gene_id ENSG00000081237; gene_version 20; transcript_id ENST00000442510; transcript_version 8; gene_type protein_coding; gene_name CD45A;")
vec <- gsub(";$", "", vec)
vec_apply <- str_split_fixed(vec, "; ", n = str_count(string = vec, pattern = ";") + 1) %>%
strsplit(split = " ")
paste(sapply(vec_apply, `[[`, 1),
sapply(vec_apply, function(x) paste0(shQuote(x[[2]], type = "cmd"), ";")), collapse = " ")
"gene_id \"ENSG00000081237\"; gene_version \"20\"; transcript_id \"ENST00000442510\"; transcript_version \"8\"; gene_type \"protein_coding\"; gene_name \"CD45A\";"
gene_id "ENSG00000081237"; gene_version "20"; transcript_id "ENST00000442510"; transcript_version "8"; gene_type "protein_coding"; gene_name "CD45A";
Or as suggested by @GregorThomas in another answer, use cat()
to view the output to check if double quotes are added successfully.
cat(paste(sapply(vec_apply, `[[`, 1),
sapply(vec_apply, function(x) paste0(shQuote(x[[2]], type = "cmd"), ";")), collapse = " "))
gene_id "ENSG00000081237"; gene_version "20"; transcript_id "ENST00000442510"; transcript_version "8"; gene_type "protein_coding"; gene_name "CD45A";