Search code examples
rsapplystrsplit

"subscript out of bounds" on character vector


I have a vector "nameAlpha" such as c("Mark Twain", "Phil Hall", "Michael P. O'Connor", " ",...). I want to pass each first name to another vector "nameAlpha_first". I run this

nameAlpha_first <- sapply(strsplit(nameAlpha, "\\s+"), "[[", 1)

But I get

Error in FUN(X[[12L]], ...) : subscript out of bounds

Could it be because few elements of the vector are empty? How I move to fix it?


Solution

  • Assume we define and preprocess a character vector as such:

    nameAlpha<-c("Mark Twain", NA, "Phil Hall", 
                 "Michael P. O'Connor", " ", "", NA, "John")
    nameAlpha[which(nchar(nameAlpha)<2)]<-NA
    

    You should write

    sapply(strsplit(nameAlpha, "\\s+"), head, n=1)
    

    And it will work. If you want the last name you can do

    sapply(strsplit(nameAlpha, "\\s+"), tail, n=1)
    

    which will provide you with the vector of last names.