Search code examples
pythonrscriptingrstudiorscript

R scripting Error { : missing value where TRUE/FALSE needed on Dataframe


I have a Data Frame which looks like this

Name  Surname  Country  Path
John   Snow      UK     /Home/drive/John 
BOB    Anderson         /Home/drive/BOB
Tim    David     UK     /Home/drive/Tim 
Wayne  Green     UK     /Home/drive/Wayne

I have written a script which first checks if country =="UK", if true, changes Path from "/Home/drive/" to "/Server/files/" using gsub in R.

Script

Pattern<-"/Home/drive/"

Replacement<- "/Server/files/"



 for (i in 1:nrow(gs_catalog_Staging_123))
{

  if( gs_catalog_Staging_123$country[i] == "UK" && !is.na(gs_catalog_Staging_123$country[i]))   
  {
   gs_catalog_Staging_123$Path<- gsub(Pattern , Replacement , gs_catalog_Staging_123$Path,ignore.case=T)
  }

}

The output i get :

    Name  Surname  Country  Path
John   Snow      UK     /Server/files/John 
*BOB    Anderson        /Server/files/BOB*
Tim    David     UK     /Server/files/Tim 
Wayne  Green     UK     /Server/files/Wayne

The output I want

Name  Surname  Country  Path
    John   Snow      UK     /Server/files/John 
    BOB    Anderson         /Home/drive/BOB
    Tim    David     UK     /Server/files/Tim 
    Wayne  Green     UK     /Server/files/Wayne

As we can clearly see gsub fails to recognize missing values and appends that row as well.


Solution

  • Many R functions are vectorized, so we can avoid a loop here.

    # example data
    df <- data.frame(
        name    = c("John", "Bob", "Tim", "Wayne"),
        surname = c("Snow", "Ander", "David", "Green"),
        country = c("UK", "", "UK", "UK"),
        path    = paste0("/Home/drive/", c("John", "Bob", "Tim", "Wayne")),
        stringsAsFactors = FALSE
    )
    
    # fix the path
    df$newpath <- ifelse(df$country=="UK" & !is.na(df$country), 
                         gsub("/Home/drive/", "/Server/files/", df$path),
                         df$path)
    

    # view result
    df
       name surname country              path             newpath
    1  John    Snow      UK  /Home/drive/John  /Server/files/John
    2   Bob   Ander           /Home/drive/Bob     /Home/drive/Bob
    3   Tim   David      UK   /Home/drive/Tim   /Server/files/Tim
    4 Wayne   Green      UK /Home/drive/Wayne /Server/files/Wayne
    

    In fact, this is the issue with your code. Each time through your loop, you check row i but then you do a full replacement of the whole column. A fix would be to add [i] at appropriate places of your final line of code:

    gs_catalog_Staging_123$Path[i] <- gsub(Pattern , Replacement , gs_catalog_Staging_123$Path[i] ,ignore.case=T)