Search code examples
regexdateordinals

Modify regex to match dates with ordinals "st", "nd", "rd", "th"


How can the regex below be modified to match dates with ordinals on the day part? This regex matches "Jan 1, 2003 | February 29, 2004 | November 02, 3202" but I need it to match also: "Jan 1st, 2003 | February 29th, 2004 | November 02nd, 3202 | March 3rd, 2010"

^(?:(((Jan(uary)?|Ma(r(ch)?|y)|Jul(y)?|Aug(ust)?|Oct(ober)?|Dec(ember)?)\ 31)|((Jan(uary)?|Ma(r(ch)?|y)|Apr(il)?|Ju((ly?)|(ne?))|Aug(ust)?|Oct(ober)?|(Sept|Nov|Dec)(ember)?)\ (0?[1-9]|([12]\d)|30))|(Feb(ruary)?\ (0?[1-9]|1\d|2[0-8]|(29(?=,\ ((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00)))))))\,\ ((1[6-9]|[2-9]\d)\d{2}))

Thank you.


Solution

  • This will depend on your use case, but in the interest of pragmatism, you might do well to just match anything matching:
    (1) any month name or abbreviation;
    (2) whitespace;
    (3) any one or two digits;
    (4) whitespace;
    (5) any st,nd,rd,th;
    (6) whitespace OR comma + optional whitespace;
    (7) any four digits;

    I'm not sure what you're matching in, but if I had Jan 35nd,3001, I think I'd rather capture it now and invalidate it later than to just skip over it right at the get-go.

    Also, depending on your data set, consider case sensitivity issues and common international English variants, like 1 Jan 2004 or 1st Jan, 2004 or January, 2004 etc.

    line breaks added

    ^(?:j(?:an(?:uary)?|un(?:e)?|ul(?:y)?)?|feb(?:ruary)?|ma(?:r(?:ch)?|y)
    |a(?:pr(?:il)?|ug(?:ust)?)|sep(?:t|tember)?|oct(?:ober)?|(?:nov|dec)(?:ember)?)  
    \s+\d{1,2}(?:st|nd|rd|th)?(?:\s+|,\s*)\d{4}\b
    

    Even more pragmatic (and readable), unless you have a very bizarre dataset, is to allow anything after the common prefixes:

    (?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)[a-z]*?\s+\d{1,2}(?:[a-z]{2})?(?:\s+|,\s*)\d{4}\b
    

    Would this match octagenarianism 99xx, 0000 ? Yes. Is that likely to be an issue? I doubt it.