I'm trying to get the last character or number of a series of symbols on data frame so I can filter some categories after. But I'm not getting the expected result.
names = as.character(c("ABC Co","DEF Co","XYZ Co"))
code = as.character(c("ABCN1","DEFMO2","XYZIOIP4")) #variable length
my_df = as.data.frame(cbind(names,code))
First Approach:
my_df[,3] = substr(my_df[,2],length(my_df[,2]),length(my_df[,2]))
What I expected to receive was: c("1","2","4")
What I am really receiving is : c("C","F","Z")
Then, I realized that length(my_df[,2])
is the number of rows of my data frame, and not the length of each cell. So, I decided to create this loop:
for (i in length(nrow(my_df))){
my_df[i,3] = substr(my_df[i,2],length(my_df[i,2]),length(my_df[i,2]))
}
What I expected to receive was: c("1","2","4")
What I am really receiving is : c("A","F","Z")
So then I tried:
for (i in length(nrow(my_df))){
my_df[i,3] = substr(my_df[i,2],-1,-1)
}
What I expected to receive was: c("1","2","4")
What I am really receiving is : c("","F","Z")
Not getting any luck, any thoughts of what am I missing? Thank you very much!
length
is a vector (or list) property, whereas in substr
you probably need a string property. Base R's nchar
works.
my_df = as.data.frame(cbind(names, code), stringsAsFactors = FALSE)
substr(my_df[,2], nchar(my_df[,2]), nchar(my_df[,2]))
# [1] "1" "2" "4"
(I added stringsAsFactors = FALSE
, otherwise you'll need to add as.character
.)