I need to extract the journal titles from a bibliography list. The titles are all within quotation marks. So is there a way to ask R to extract all text that is within parenthesis?
I have read the list into R as a text file:
"data <- readLines("Publications _ CCDM.txt")"
here are a few lines from the list:
Andronis, C.E., Hane, J., Bringans, S., Hardy, G., Jacques, S., Lipscombe, R., Tan, K-C. (2020). “Gene validation and remodelling using proteogenomics of Phytophthora cinnamomi, the causal agent of Dieback.” bioRxiv. DOI: https://doi.org/10.1101/2020.10.25.354530 Beccari, G., Prodi, A., Senatore, M.T., Balmas, V,. Tini, F., Onofri, A., Pedini, L., Sulyok, M,. Brocca, L., Covarelli, L. (2020). “Cultivation Area Affects the Presence of Fungal Communities and Secondary Metabolites in Italian Durum Wheat Grains.” Toxins https://www.mdpi.com/2072-6651/12/2/97 Corsi, B., Percvial-Alwyn, L., Downie, R.C., Venturini, L., Iagallo, E.M., Campos Mantello, C., McCormick-Barnes, C., See, P.T., Oliver, R.P., Moffat, C.S., Cockram, J. “Genetic analysis of wheat sensitivity to the ToxB fungal effector from Pyrenophora tritici-repentis, the causal agent of tan spot” Theoretical and Applied Genetics. https://doi.org/10.1007/s00122-019-03517-8 Derbyshire, M.C., (2020) Bioinformatic Detection of Positive Selection Pressure in Plant Pathogens: The Neutral Theory of Molecular Sequence Evolution in Action. (2020) Frontiers in Microbiology. https://doi.org/10.3389/fmicb.2020.00644 Dodhia, K.N., Cox, B.A., Oliver, R.P., Lopez-Ruiz, F.J. (2020). “When time really is money: in situ quantification of the strobilurin resistance mutation G143A in the wheat pathogen Blumeria graminis f. sp. tritici.” bioRxiv, doi: https://doi.org/10.1101/2020.08.20.258921 Graham-Taylor, C., Kamphuis, L.G., Derbyshire, M.C. (2020). “A detailed in silico analysis of secondary metabolite biosynthesis clusters in the genome of the broad host range plant pathogenic fungus Sclerotinia sclerotiorum.” BMC Genomics https://doi.org/10.1186/s12864-019-6424-4
try something like this:
library(stringr)
str_extract_all(x, "“.*?”") %>% .[[1]]
if you want to remove quotation from result add this at the end of pipeline:
str_remove_all("[“”]")
Output:
[1] "Gene validation and remodelling using proteogenomics of Phytophthora cinnamomi, the causal agent of Dieback."
[2] "Cultivation Area Affects the Presence of Fungal Communities and Secondary Metabolites in Italian Durum Wheat Grains."
[3] "Genetic analysis of wheat sensitivity to the ToxB fungal effector from Pyrenophora tritici-repentis, the causal agent of tan spot"
[4] "When time really is money: in situ quantification of the strobilurin resistance mutation G143A in the wheat pathogen Blumeria graminis f. sp. tritici."
[5] "A detailed in silico analysis of secondary metabolite biosynthesis clusters in the genome of the broad host range plant pathogenic fungus Sclerotinia sclerotiorum."