Retrieving title and ID data from a webpage and saving in the excel file

I am looking to retrieve the titles and PMID (PubMed ID) records from the webpage and saving the same in the MS excel file. I tried using the easyPubMed library in R to extract, however, I am not able to obtain the same. Is there any library or package obtain this. Please assist me with this.

Example of the input data and expected output data is provided below:

Code:

    library(easyPubMed)
    my_query <- '"ACSL1" [ti] OR "Acyl-CoA Synthetase Long Chain Family Member 1" [ti] OR "Acyl-CoA Synthetase Long Chain Family Member 1" [ti] OR "Fatty-Acid-Coenzyme A Ligase, Long-Chain 2" [ti] OR "Long-Chain Fatty-Acid-Coenzyme A Ligase 1" [ti] OR "Long-Chain-Fatty-Acid–CoA Ligase 1" [ti] OR "Long-Chain Fatty Acid-CoA Ligase 2" [ti] OR "Long-Chain Acyl-CoA Synthetase" [ti] OR "Long-Chain Acyl-CoA Synthetase 1" [ti] OR "Long-Chain Acyl-CoA Synthetase 2" [ti] OR "Lignoceroyl-CoA Synthase" [ti] OR "Palmitoyl-CoA Ligase 1" [ti] OR "Palmitoyl-CoA Ligase 2" [ti] OR "Acyl-CoA Synthetase 1" [ti] OR "LACS 1" [ti] OR "LACS 2" [ti] OR "LACS-1" [ti] OR "LACS-2" [ti] OR "FACL2" [ti] OR "FACL1" [ti] OR "LACS1" [ti] OR "LACS2" [ti] OR "ACS1" [ti] OR "LACS" [ti] OR "Fatty-Acid-Coenzyme A Ligase, Long-Chain 1" [ti] OR "Palmitoyl-CoA Ligase 1" [ti] AND (acyl [ti] OR CoA [ti] OR fatty [ti] OR synthetase [ti] OR Palmitoyl [ti] OR ligase [ti] OR ACSL1 [ti]'
#### To count the number of PubMed IDs####
    my_entrez_id <- get_pubmed_ids(my_query)
    my_entrez_id$Count

Input

Webpage: https://pubmed.ncbi.nlm.nih.gov/
Search String: "ACSL1" [ti] OR "Acyl-CoA Synthetase Long Chain Family Member 1" [ti] OR "Acyl-CoA Synthetase Long Chain Family Member 1" [ti] OR "Fatty-Acid-Coenzyme A Ligase, Long-Chain 2" [ti] OR "Long-Chain Fatty-Acid-Coenzyme A Ligase 1" [ti] OR "Long-Chain-Fatty-Acid–CoA Ligase 1" [ti] OR "Long-Chain Fatty Acid-CoA Ligase 2" [ti] OR "Long-Chain Acyl-CoA Synthetase" [ti] OR "Long-Chain Acyl-CoA Synthetase 1" [ti] OR "Long-Chain Acyl-CoA Synthetase 2" [ti] OR "Lignoceroyl-CoA Synthase" [ti] OR "Palmitoyl-CoA Ligase 1" [ti] OR "Palmitoyl-CoA Ligase 2" [ti] OR "Acyl-CoA Synthetase 1" [ti] OR "LACS 1" [ti] OR "LACS 2" [ti] OR "LACS-1" [ti] OR "LACS-2" [ti] OR "FACL2" [ti] OR "FACL1" [ti] OR "LACS1" [ti] OR "LACS2" [ti] OR "ACS1" [ti] OR "LACS" [ti] OR "Fatty-Acid-Coenzyme A Ligase, Long-Chain 1" [ti] OR "Palmitoyl-CoA Ligase 1" [ti] AND (acyl [ti] OR CoA [ti] OR fatty [ti] OR synthetase [ti] OR Palmitoyl [ti] OR ligase [ti] OR ACSL1 [ti]

Expected Output:

dput(Output)
structure(list(Title = c("The 3-ketoacyl-CoA thiolase: an engineered enzyme for carbon chain elongation of chemical compounds.", 
"Potential influence of miR-192 on the efficacy of saxagliptin treatment in T2DM complicated with non-alcoholic fatty liver disease.", 
"Myosteatosis in nonalcoholic fatty liver disease: An exploratory study."
), PMID = c(32830293L, 32829627L, 32828745L)), class = "data.frame", row.names = c(NA, 
-3L))

Solution

You need to collect and parse the results of your query. I Think something like this should do

my_entrez_id <- get_pubmed_ids(my_query)
my_entrez_data <- fetch_pubmed_data(my_entrez_id)
my_entrez_list <- my_entrez_data %>% 
  XML::xmlParse() %>% 
  XML::xmlToList() #turn the xml int an R List thats is easier to handle
my_entrez_df <- my_entrez_list %>% 
  purrr::map_df(function(x){ # use the map function from the package purrr to select the attributes that we need
    tibble(
      Title = x$MedlineCitation$Article$ArticleTitle[[1]], 
      PMID = x$MedlineCitation$PMID[[1]])}
    )