Search code examples
regexpython-re

Regex pattern for embedded date


I'm looking for a regex pattern that will help me detect the date an app was last updated. However, the text from web scraping looks like this:

A lot of text here etc etc.Updated onMar 8, 2023#6and more text here with no spaces

I would like to get "Mar 8, 2023", but I'm finding it hard to extract from the surrounding text since there are no spaces.

I'm trying with

pattern = r"Updated on\.* \d+\.* 2023"

But it hasn't worked so far.

Thanks.


Solution

  • This worked:

    pattern = r"[a-zA-Z]{3} \d+, 2023"
    re.findall(pattern, beautiful_soup)