Search code examples
rdataframestring-length

Is there any command to count the length of rows?


Having a dataframe like this:

df <- data.frame(id = c(1,2,3), date1 = c("2014-Dec 2018","2009-2010","Jan 2009-Aug 2010"), date2 = c("Feb 2016-Dec 2018","2014-Dec 2018","Oct 2013-Dec 2018"))

  id             date1             date2
1  1     2014-Dec 2018 Feb 2016-Dec 2018
2  2         2009-2010     2014-Dec 2018
3  3 Jan 2009-Aug 2010 Oct 2013-Dec 2018

Is their any command which could check in every row if their is something different than this format "Jan 2009-Aug 2010" and keep it into a new dataframe? Meaning that check if there are 17 charcters including the spaces between month and year.

Example of expected output

data.frame(id = c(1,2), date1 = c("2014-Dec 2018","2009-2010"), date2 = c("Feb 2016-Dec 2018","2014-Dec 2018"))
  id         date1             date2
1  1 2014-Dec 2018 Feb 2016-Dec 2018
2  2     2009-2010     2014-Dec 2018

Solution

  • A safest option could be to split your data and use grepl to check whether the date respects the format:

    pattern = "[A-Za-z]{3} \\d{4}-[A-Za-z]{3} \\d{4}"
    split(df, rowSums(sapply(df[-1], grepl, pattern = pattern)) == 2)
    

    output

    $`FALSE`
      id         date1             date2
    1  1 2014-Dec 2018 Feb 2016-Dec 2018
    2  2     2009-2010     2014-Dec 2018
    
    $`TRUE`
      id             date1             date2
    3  3 Jan 2009-Aug 2010 Oct 2013-Dec 2018
    

    Explanation

    The pattern is not that complicated: three {3} letters [A-Za-z] followed by 4 digits \\d, twice, and separated with -.