Search code examples
rrow

R - Count occurence of forward slashes in each row


I have a dataframe that looks like this:

12/04/2017 00:00:02.30,-2.31,-2.97,-0.3,-1.4
12/04/2017 00:00:02.40,-1.89,-2.94,-1.15,-1.4
12/04/2017 00:00:02.50,-1.66,-3.14,-0.06,-1.39
12/04/2017 00:00:02.60,-1.84,-3.16,0.18,-1.37
12/04/2017 00:00:02.70,-2.12/04/2017 00:00:02.80,-2,-2.56,0.17,-1.41
12/04/2017 00:00:02.90,-2.18,-2.31,0.11,-1.45
12/04/2017 00:00:03,-2.14,-2.21,-0.05,-1.45

The logger where the data comes from somtimes writes one of the dates into the row of the other line (5th row in the example). I need to delete these lines in R. But I have not really a clue how to find and delete these lines in the dataframe.

My first idea was to look for the number of forward slashes in each row. But could not find a way on how to do that.
Another way might be to get the mean length of all rows and check for lines that are longer than the mean and delete those. But same here. Can't find a way to make a mean over aall characters ina row (strings and numbers).

edit: The output from str(df): str(df)

'data.frame':   856645 obs. of  6 variables:
 $ station: chr  "Arof" "Arof" "Arof" "Arof" ...
 $ date   : Factor w/ 863989 levels "12/04/2017 00:00:01.10",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ u      : Factor w/ 1327 levels "","0","-0.01",..: 132 84 146 136 112 120 126 33 281 240 ...
 $ v      : num  -0.62 -0.41 -1.58 -1.65 -1.25 -1.8 -1.86 -2.46 -2.59 -2.87 ...
 $ w      : num  0.89 1.09 0.63 0.53 0.84 0.58 0.46 0.48 -0.16 -0.01 ...
 $ temp   : num  -1.36 -1.41 -1.41 -1.41 -1.41 -1.41 -1.5 -1.48 -1.51 -1.46 ...
 - attr(*, "na.action")=Class 'omit'  Named int [1:7344] 18 113 246 378 513 643 646 778 909 1042 ...
  .. ..- attr(*, "names")= chr [1:7344] "18" "113" "246" "378" ...

Solution

  • Usinggrepl we can search for . followed by 2 digits number followed by /

    grepl("\\.\\d{2}\\/",data$date)
    [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
    
    apply(data,1, function(x) sum(grepl("\\.\\d{2}\\/",x)))