Search code examples
rsubstrstring-length

Getting last character/number of data frame column


I'm trying to get the last character or number of a series of symbols on data frame so I can filter some categories after. But I'm not getting the expected result.

names = as.character(c("ABC Co","DEF Co","XYZ Co")) 
code = as.character(c("ABCN1","DEFMO2","XYZIOIP4")) #variable length
my_df = as.data.frame(cbind(names,code))

First Approach:

my_df[,3] = substr(my_df[,2],length(my_df[,2]),length(my_df[,2]))

What I expected to receive was: c("1","2","4")

What I am really receiving is : c("C","F","Z")

Then, I realized that length(my_df[,2]) is the number of rows of my data frame, and not the length of each cell. So, I decided to create this loop:

for (i in length(nrow(my_df))){
  my_df[i,3] = substr(my_df[i,2],length(my_df[i,2]),length(my_df[i,2]))
}

What I expected to receive was: c("1","2","4")

What I am really receiving is : c("A","F","Z")

So then I tried:

for (i in length(nrow(my_df))){
  my_df[i,3] = substr(my_df[i,2],-1,-1)
}

What I expected to receive was: c("1","2","4")

What I am really receiving is : c("","F","Z")

Not getting any luck, any thoughts of what am I missing? Thank you very much!


Solution

  • length is a vector (or list) property, whereas in substr you probably need a string property. Base R's nchar works.

    my_df = as.data.frame(cbind(names, code), stringsAsFactors = FALSE)
    substr(my_df[,2], nchar(my_df[,2]), nchar(my_df[,2]))
    # [1] "1" "2" "4"
    

    (I added stringsAsFactors = FALSE, otherwise you'll need to add as.character.)