separating and extracting part of strings of URLs using regex?

I have a df with variable named url. Each url string in url has a unique six character alphanumeric ID in the URL string. Ive been trying to extract a specific part of each string, the article_id from all urls, and then add it to the df as a new variable.

For example, xwpd7w is the article_id for https://www.vice.com/en_us/article/xwpd7w/how-a-brooklyn-gang-may-have-gotten-crazy-rich-dealing-for-el-chapo

How do I extract article_ids from all urls in the df based on their position next to /article/? Using any method, regex or not?

I have so far done the following:

df.url.str.split()

ex output: [https://www.vice.com/en_au/article/j539yy/smo...

df['cutcurls'] = df.url.str.join(sep=' ')
ex output: h t t p s : / / w w w . v i c e . c o m / e n

Any ideas?

Solution

Apply the "str.extract" method.

df=pd.DataFrame({"url":["https://www.vice.com/en_us/article/xwpd7w/how-a-brooklyn-gang-may-have-gotten-crazy-rich-dealing-for-el-chapo","https://www.www.www//en_us/article/idId2019/buzzwords"]}) 

df["articel_id"]= df.url.str.extract(r"/article/([^/]+)")

    Out:
        url articel_id
        0  https://www.vice.com/en_us/article/xwpd7w/how-...     xwpd7w
        1  https://www.www.www//en_us/article/idId2019/bu...   idId2019

([^/]+): groups consecutive non '/' characters