Search code examples
pythonregexregex-group

How use regex in python 3.7 to have 2 OR 3 groups?


I have to extract the brand name, model, and sometimes trim level of cars found on a website. Problem is that when I put two groups in my regex, I do not have access to the third element (trim level of the car) and when I put three groups in my regex, I get nothing from cars without trim levels.

<a href="https://XXX.ir/car/bmw/x4">بی‌ام‌و ایکس ۴ </a>
<a href="https://XXX.ir/car/peugeot/405/glx">پژو ۴۰۵ جی‌ال‌ایکس</a>

my_regex_1 = r'https:\/\/XXX\.ir\/car\/(.+)\/(.+)\/(.+)'
my_regex_2 = r'https:\/\/XXX\.ir\/car\/(.+)\/(.+)\/'

My code:

import requests
from bs4 import BeautifulSoup
import re

mainpage = requests.get('https://bama.ir/')
soup = BeautifulSoup(mainpage.text, 'html.parser')
brands = soup.find_all('a')
infos = []
for item in brands:
    link = item['href']
    info = re.findall(r'https:\/\/bama\.ir\/car\/([^\/]+?)\/([^\/]+?)(?:\/([^"]+))?', link)
    infos.append(info)
print(infos)

Solution

  • Try Regex: https:\/\/XXX\.ir\/car\/([^\/]+?)\/([^\/]+?)(?:\/([^\"]+))?\"

    Demo