Search code examples
rloopsstringdist

efficient programming in R


I have a data like

author_id paper_id confirmed     author_name1   author_affiliation1         author_name   
   826    25733         1     Emanuele Buratti  Genetic engineering    Emanuele Buratti
   826    25733         1     Emanuele Buratti  International center   Emanuele Buratti
   826    47276         1     Emanuele Buratti                         Emanuele Buratti
   826    77012         1     Emanuele Buratti                         Emanuele Buratti
   826    77012         1     Emanuele Buratti                         Emanuele Buratti
   826    79468         1     Emanuele Buratti                         Emanuele Buratti

author_affiliation
Genetic enginereing                                                                                                
The International Centre for Genetic Engineering and Biotechnology, Padriciano 66,        
Trieste, Italy


International Centre for Genetic Engineering and Biotechnology, Padriciano 99, 34149                         
Trieste, Italy

Now I have to check for each row strindist between author_name and author_name1(name_dist) and the stringdist between author_affiliation vs author_affiliation1(aff_sit.

I am using

name_dist<-vector()
aff_dist<-vector()
for(i in 1:nrow(mer1))
{
 name_dist[i]<-stringdist(mer1$author_name1[i],mer1$author_name[i],method="lv")
 aff_dist[i]<-stringdist(mer1$author_affiliation1[i],mer1$author_affiliation[i],method="lv")

 }

But this is using a lot of time.How could this be done efficiently?

Thanks


Solution

  • You can directly vectorize it

    i=1:nrow(mer1)
    name_dist<-stringdist(mer1$author_name1[i],mer1$author_name[i],method="lv")
    aff_dist<-stringdist(mer1$author_affiliation1[i],mer1$author_affiliation[i],method="lv")