Search code examples
rcharacterdimensions

In R from a csv file get rows from a column which contain more then 2 characters


I have a csv file from which I have to get rows which have more then 2 characters from a specific coloumn. My csv file look like this

"Name" "Age" "ID"     "RefID"
"ABC"  "12"  "Abccc"  "xyzw"
"AAA"  "14"  "A"      "X"
"BBB"  "18"  "DEfff"  "dfg"
"CCY"  "10"  "F"      "XY"
"CCZ"  "20"  "R"      "XYC"

So from column 3 and 4 I have take rows which have >= two characters.

I tried following way

data = read.table(file ='res.csv', header = T)
dat2 = as.character(data[,3])
ind = dat2[(which(nchar(dat2) >=2)),]

But its giving me error and I am not able to find out how can I proceed with both cols at once. My result should be like below

"Name" "Age" "ID"     "RefID"
"ABC"  "12"  "Abccc"  "xyzw"
"BBB"  "18"  "DEfff"  "dfg"
"CCY"  "10"  "F"      "XY"
"CCZ"  "20"  "R"      "XYC"

Any help would be appriciated


Solution

  • We can avoid multiple steps, i.e. conversion to character class by specifying stringsAsFactors = FALSE in the read.table to avoid converting the character columns to factor class. Then, get the number of characters of the third column with nchar and create the logical condition by comparing if it is greater than or equal to 2

    data[nchar(data[,3])>=2,]
    #   Name Age    ID RefID
    #1  ABC  12 Abccc  xyzw
    #3  BBB  18 DEfff   dfg
    

    For multiple columns, use &

    data[nchar(data[,3])>=2 & data[,4] >=2,]
    

    But, it would become a bit difficult when there are 100s of columns. For this purpose, we loop through the columns of interest, do the comparison and Reduce it to a single logical vector

    data[Reduce(`&`, lapply(data[3:4], function(x) nchar(x) >=2)),]
    #  Name Age    ID RefID
    #1  ABC  12 Abccc  xyzw
    #3  BBB  18 DEfff   dfg
    

    If the condition needs to be TRUE for any of the columns, then change the & to | in the Reduce

    data[Reduce(`|`, lapply(data[3:4], function(x) nchar(x) >=2)),]
    #   Name Age    ID RefID
    #1  ABC  12 Abccc  xyzw
    #3  BBB  18 DEfff   dfg
    #4  CCY  10     F    XY
    #5  CCZ  20     R   XYC
    

    data

    data <- read.table(file ='res.csv', header = TRUE, stringsAsFactors = FALSE)