Search code examples
rrvest

Trying to have complete names of the educational institutions from the list of shortened names


I have the shortened names of educational institutions. The reproducible sample is given here

data <- structure(list(Affiliations = c("UNIV MELBOURNE", "UNIV NEWCASTLE", 
                                        "FORDHAM UNIV", "PRINCETON UNIV", 
                                        "CITY UNIV LONDON", "UNIV CONNECTICUT", 
                                        "EMORY UNIV", "NATL BUR ECON RES", 
                                        "NATL CHENGCHI UNIV", "OHIO STATE UNIV")), 
                  row.names = c(NA, -10L), 
                  class = c("tbl_df", "tbl", "data.frame"))

I am trying to have the complete names of the institutions from this list.

For example, "University of Melbourne" against "UNIV MELBOURNE", "City, University of London" against "CITY UNIV LONDON" and "National Chengchi University" against "NATL CHENGCHI UNIV".

Currently, I am using the "searcher" package to manually search each of the strings through browser and using the readline function to update the complete names.

library(searcher) # for the function, search_startpage

df$new <- NA

for (i in 1:length(df$Affiliations)) {
  search_startpage(data$Affiliations[i], rlang = F)
  data$new[i] <- readline()
}

This is time consuming, as I have more than 1000 affiliations. Is there any efficient way to do this using rvest or any other package?


Solution

  • @ronak-shah

    I have managed to get what I wanted.

    Here is the code:

    data$Affiliations <- gsub(" ", "+", data$Affiliations)
    
    data$New <- NA
    
    for (i in 1:nrow(data)) {
      url <- paste0("https://www.google.com/search?q=", data$Affiliations[i])
      x <- read_html(url) %>% html_nodes("h3") %>% html_text()
      print(x)
      data$New[i] <- x[as.numeric(readline())]
    }
    

    I am able to select the appropriate name from the search results.

     [1] "Melbourne (City in Australia)"                                 
     [2] "Melbourne City"                                                
     [3] "University of Melbourne"                                       
     [4] "Edwise International - Study Abroad Consultants - Chennai"     
     [5] "University of Melbourne"                                       
     [6] "The University of Melbourne"                                   
     [7] "The University of Melbourne (Unimelb) - Ranking, Fees"         
     [8] "The University of Melbourne : Rankings, Fees & Courses Details"
     [9] "University of Melbourne - Wikipedia"                           
    [10] "Monash University - one of the top universities in Australia"  
    [11] "The University of Melbourne | Study Options"                   
    3
     [1] "Newcastle University: The things we do here make a difference out ..."   
     [2] "Newcastle University"                                                    
     [3] "University of Newcastle (Public university in Callaghan, Australia)"     
     [4] "Postgraduate - Newcastle University"                                     
     [5] "The University of Newcastle, Australia"                                  
     [6] "International - The University of Newcastle, Australia"                  
     [7] "Newcastle University, Exams: Rankings, Fees, Courses"                    
     [8] "The University of Newcastle - Ranking, Courses, Fees, Entry criteria ..."
     [9] "Newcastle University - Wikipedia"                                        
    [10] "Newcastle University : Rankings, Fees & Courses Details"                 
    [11] "Newcastle University courses and application information - SI-UK"        
    [12] "Newcastle University | Apply Now for 2021 | INTO"                        
    5
     [1] "Fordham University"                                             
     [2] "COVID-19 Guidelines - Fordham University"                       
     [3] "Fordham University School of Law"                               
     [4] "Academics | Fordham"                                            
     [5] "Fordham University - Wikipedia"                                 
     [6] "Fordham University - Profile, Rankings and Data | US News Best" 
     [7] "Fordham University (Gabelli) - Best Business Schools - US News" 
     [8] "Fordham University: Rankings, Fees, Courses, Admission 2021 ..."
     [9] "Fordham University - Niche"                                     
    [10] "Fordham University Athletics - Official Athletics Website"      
    1
    [1] "Princeton University"                                             
    [2] "Princeton University"                                             
    [3] "Princeton"                                                        
    [4] "Princeton University Graduate School"                             
    [5] "Princeton University - Wikipedia"                                 
    [6] "Princeton University : Rankings, Fees & Courses Details | Top"    
    [7] "Princeton University - Profile, Rankings and Data | US News Best" 
    [8] "Princeton University: Rankings, Fees, Courses, Admission 2021 ..."
    1
    [1] "City, University of London"                            
    [2] "City, University of London"                            
    [3] "CITY, University of London: Rankings, Fees, Courses"   
    [4] "City, University of London - Wikipedia"                
    [5] "City, University of London | Apply Now for 2021 | INTO"
    [6] "City, University of London"                            
    1
     [1] "University of Connecticut"                                        
     [2] "University of Connecticut"                                        
     [3] "University of Connecticut - Wikipedia"                            
     [4] "University of Connecticut (UCONN) - Shiksha Study Abroad"         
     [5] "University of Connecticut (Uconn) - Profile, Rankings - U.S. News"
     [6] "University of Connecticut : Rankings, Fees & Courses Details"     
     [7] "University of Connecticut - Niche"                                
     [8] "University of Bridgeport: A Leading University in Connecticut"    
     [9] "University of Connecticut | LinkedIn"                             
    [10] "Southern Connecticut State University"                            
    [11] "Eastern Connecticut State University"                             
    1
     [1] "Home | Emory University | Atlanta GA"                                   
     [2] "Emory University"                                                       
     [3] "Emory University (Medical school in Atlanta, Georgia)"                  
     [4] "Emory University School of Law (Independent school in Atlanta, Georgia)"
     [5] "Home | Emory University | Atlanta GA"                                   
     [6] "Emory School of Medicine - Emory University"                            
     [7] "Degrees and Programs - Academics - Emory University"                    
     [8] "Explore Emory | Emory University | Atlanta GA"                          
     [9] "Emory University - Wikipedia"                                           
    [10] "Emory University - Profile, Rankings and Data | US News Best"           
    [11] "Emory Healthcare: Atlanta Hospitals, Clinics and Healthcare ..."        
    [12] "Emory University (EU) - Shiksha Study Abroad"                           
    2
    [1] "National Bureau of Economic Research | NBER"                        
    [2] "National Bureau of Economic Research bulletin on aging and health"  
    [3] "PubMed Central, Figure 1: Natl Bur Econ Res Bull Aging Health. 2011"
    [4] "PubMed Central, Figure II - NCBI"                                   
    [5] "Education and Health: Evaluating Theories and Evidence"             
    [6] "Home - The National Bureau of Asian Research (NBR)"                 
    [7] "[XLS] Economics & Business"                                         
    [8] "PRIME PubMed | Natl Bur Econ Res Bull Aging Health journal ..."     
    [9] "Friedman and the Quantity Theory - Michael J. Gootzeit, 1980"       
    1
     [1] "National Chengchi University"                                   
     [2] "Admission - National Chengchi University"                       
     [3] "國立政治大學: NCCU"                                             
     [4] "National Chengchi University - Wikipedia"                       
     [5] "National Chengchi University | World University Rankings | THE" 
     [6] "National Chengchi University : Rankings, Fees & Courses Details"
     [7] "National Chengchi University - MastersPortal.com"               
     [8] "National Chengchi University in Taiwan - Masterstudies"         
     [9] "National Chengchi University in Taiwan - US News Best"          
    [10] "National Chengchi University | Ranking & Review - uniRank"      
    [11] "National Chengchi University | LinkedIn"                        
    1
    [1] "The Ohio State University"                                      
    [2] "The Ohio State University"                                      
    [3] "Ohio State Buckeyes football (Football team)"                   
    [4] "Ohio State University - Wikipedia"                              
    [5] "Ohio State Buckeyes | Ohio State University Athletics"          
    [6] "The Ohio State University (OSU) - Shiksha Study Abroad"         
    [7] "Ohio State University--Columbus - Profile, Rankings - U.S. News"
    [8] "Welcome to Ohio University"                                     
    1
    

    The final data frame is

    # A tibble: 10 x 2
       Affiliations       New                                        
       <chr>              <chr>                                      
     1 UNIV+MELBOURNE     University of Melbourne                    
     2 UNIV+NEWCASTLE     The University of Newcastle, Australia     
     3 FORDHAM+UNIV       Fordham University                         
     4 PRINCETON+UNIV     Princeton University                       
     5 CITY+UNIV+LONDON   City, University of London                 
     6 UNIV+CONNECTICUT   University of Connecticut                  
     7 EMORY+UNIV         Emory University                           
     8 NATL+BUR+ECON+RES  National Bureau of Economic Research | NBER
     9 NATL+CHENGCHI+UNIV National Chengchi University               
    10 OHIO+STATE+UNIV    The Ohio State University