I have a csv file from which I have to get rows which have more then 2 characters from a specific coloumn. My csv file look like this
"Name" "Age" "ID" "RefID"
"ABC" "12" "Abccc" "xyzw"
"AAA" "14" "A" "X"
"BBB" "18" "DEfff" "dfg"
"CCY" "10" "F" "XY"
"CCZ" "20" "R" "XYC"
So from column 3 and 4 I have take rows which have >= two characters.
I tried following way
data = read.table(file ='res.csv', header = T)
dat2 = as.character(data[,3])
ind = dat2[(which(nchar(dat2) >=2)),]
But its giving me error and I am not able to find out how can I proceed with both cols at once. My result should be like below
"Name" "Age" "ID" "RefID"
"ABC" "12" "Abccc" "xyzw"
"BBB" "18" "DEfff" "dfg"
"CCY" "10" "F" "XY"
"CCZ" "20" "R" "XYC"
Any help would be appriciated
We can avoid multiple steps, i.e. conversion to character
class by specifying stringsAsFactors = FALSE
in the read.table
to avoid converting the character columns to factor
class. Then, get the number of characters of the third column with nchar
and create the logical condition by comparing if it is greater than or equal to 2
data[nchar(data[,3])>=2,]
# Name Age ID RefID
#1 ABC 12 Abccc xyzw
#3 BBB 18 DEfff dfg
For multiple columns, use &
data[nchar(data[,3])>=2 & data[,4] >=2,]
But, it would become a bit difficult when there are 100s of columns. For this purpose, we loop through the columns of interest, do the comparison and Reduce
it to a single logical vector
data[Reduce(`&`, lapply(data[3:4], function(x) nchar(x) >=2)),]
# Name Age ID RefID
#1 ABC 12 Abccc xyzw
#3 BBB 18 DEfff dfg
If the condition needs to be TRUE for any
of the columns, then change the &
to |
in the Reduce
data[Reduce(`|`, lapply(data[3:4], function(x) nchar(x) >=2)),]
# Name Age ID RefID
#1 ABC 12 Abccc xyzw
#3 BBB 18 DEfff dfg
#4 CCY 10 F XY
#5 CCZ 20 R XYC
data <- read.table(file ='res.csv', header = TRUE, stringsAsFactors = FALSE)