Search code examples
rlistsplitlogical-operators

Function to abbreviate scientific names


Could you please help me?

I'm trying to modify an R function written by a colleague. This function receives a character vector with scientific names (Latin binomes), just like this one:

Name
Cerradomys scotti
Oligoryzomys sp
Philander frenatus
Byrsonima sp
Campomanesia adamantium
Cecropia pachystachya
Cecropia sp
Erythroxylum sp
Ficus sp
Leandra aurea

Then, it should abbreviate the scientific names, using only the first three letters of the genus (first term) and the epithet (second term) to make a short code. For instance, Cerradomys scotti should become Cersco.

This is the original function:

AbbreviatedNames <- function(vector) {

    abbreviations <- character(length = length(vector))
    
    splitnames <- strsplit(vector, " ")
    
    for (i in 1:length(vector)) {
        vector[i] <- if(splitnames[[i]][2] == "^sp") {
            paste(substr(splitnames[[i]][1],1,3),
                  splitnames[[i]][2], sep = "")
        }
        
        else {
            paste(substr(splitnames[[i]][1],1,3),
                  substr(splitnames[[i]][2],1,3), sep = "")
        }
        
    }
    
    vector
    
    }

With a simple list like that one, the function works perfectly. However, when the list has some missing or extra elements, it does not work. The loop stops when it meets the first row that does not match the pattern. Let's take this more complex list as an example:

Name
Cerradomys scotti
Oligoryzomys sp
Philander frenatus
Byrsonima sp
Campomanesia adamantium
Cecropia pachystachya
Cecropia sp
Erythroxylum sp
Ficus sp
Leandra aurea
Morfosp1
Vismia cf brasiliensis

See that Morfosp1 has only 1 term. And Vismia cf brasiliensis has an additional term (cf) in the middle.

I've tried adapting the function, for instance, this way:

AbbreviatedNames <- function(vector) {

    abbreviations <- character(length = length(vector))
    
    splitnames <- strsplit(vector, " ")
    
    for (i in 1:length(vector)) {
        vector[i] <- if(splitnames[[i]][2] == "^sp" & is.na(splitnames[[i]][2]))) {
            paste(substr(splitnames[[i]][1],1,3),
                  splitnames[[i]][2], sep = "")
        }
        
        else {
            paste(substr(splitnames[[i]][1],1,3),
                  substr(splitnames[[i]][2],1,3), sep = "")
        }
        
    }
    
    vector
    
    }

Nevertheless, it does not work. I get this error message:

Error in if (splitnames[[i]][2] == "^sp" & is.na(splitnames[[i]][2])) { : 
  valor ausente onde TRUE/FALSE necessário

How could I make the function:

  1. Deal also with names that have only 1 term?

Expected outcome: Morfosp1 -> Morfosp1 (stays the same)

  1. Deal also with names that have an additional term in the middle?

Expected outcome: Vismia cf brasiliensis -> Visbra (term in the middle is ignored)

Thank you very much!


Solution

  • AbbreviatedNames <- function(vector) {
      
      abbreviations <- character(length = length(vector))
      
      splitnames <- strsplit(vector, " ")
      
      for (i in 1:length(vector)){
        
        # One name
        if(length(splitnames[[i]])==1){
          vector[i] <- paste(substr(splitnames[[i]][1],1,3),
                substr(splitnames[[i]][2],1,3), sep = "")
        }
        
        # Two names
        else if(length(splitnames[[i]])==2){
          vector[i] <- if(splitnames[[i]][2] == "^sp") {
            paste(substr(splitnames[[i]][1],1,3),
                  splitnames[[i]][2], sep = "")
          }
          else {
            paste(substr(splitnames[[i]][1],1,3),
                  substr(splitnames[[i]][2],1,3), sep = "")
          }
        }
        
        # Three names
        else if(length(splitnames[[i]])==3){
          vector[i] <- paste(substr(splitnames[[i]][1],1,3),
                  substr(splitnames[[i]][3],1,3), sep = "")
          # Assuming that the unwanted word is always in the middle 
        }
        
      }
      
      return(vector)
    }
    

    I tested on the list you gave and it seems to work, tell me if you need a more general code