Search code examples
rrownchar

R - delete rows with more than x chars using which


I have a dataframe with 7 variables. Some variables have more characters then they are supposed to have. To find the lines containing too many characters for one value I use this:

df <- df[-c(which(nchar(as.character(df$u)) > 5)), ]

So every line with more than 5 characters in df$u should be deleted. The problem is that this approach deletes everything. In this case there are no values in df$u that have more than 5 chars, so nothing should be deleted. If I change the line above to

df <- df[-c(which(nchar(as.character(df$u)) > 4)), ]

Two lines are deleted, which is correct as there are two occasions where df$u has more than 4 chars.

The Problem is that I can't figure out where the issue is. It worked just fine with hundreds of files and suddenly stopped working.

A small code example of df:

station date u v w temp dir
Balc 2017.12.25_0:0:0.005940 0.66 0.81 0.65 2.22 320.8
Balc 2017.12.25_0:0:0.106316 0.34 0.53 0.36 2.22 327.5
Balc 2017.12.25_0:0:0.205374 0.4456786 0.60 0.49 2.20 323.9
Balc 2017.12.25_0:0:0.306819 0.43 0.35 0.82 2.22 309.5

Solution

  • The problem is in the indexing. If nothing has more than 5 values the conditions inside which returns all FALSE and which returns integer(0), and indexing the column with this return nothing. Try with logical:

    df[!(nchar(as.character(df$u)) > 5),]