Search code examples
pythonpandasbeautifulsoupselect-options

Extraction of ids and options from select using BeautifulSoup and arranging them in Pandas dataframe


I have the following html code which I have extracted:

<select class="class1", id="id1">
    <option value="0">A1</option>
    <option value="1">A2</option>
    <option value="2">A3</option>
    <option value="3">A4</option>
    <option value="4">A5</option>
    <option value="5">A6</option>
</select>
.
.
.
<select class="class2", id="id2">
    <option value="0">B1</option>
    <option value="1">B2</option>
    <option value="2">B3</option>
</select>
.
.
<select class="class3", id="id3">
    <option value="0">C1</option>
    <option value="1">C2</option>
    <option value="2">C3</option>
    <option value="2">C4</option>
</select>

I need to extract the options and the corresponding ids of each select and arrange them into a Pandas dataframe, similar to this:

id option
id1 A1
id1 A2
id1 A3
id2 B1
id2 B2
id2 B3
id3 C1
id3 C2
id3 C3
id3 C4

Solution

  • I recommend using BeautifulSoup for this.

    from bs4 import BeautifulSoup
    parser = BeautifulSoup(s)
    
    d = {'id': [],'option': []}
    for s in parser.find_all('select'):
        for o in s.find_all('option'):
            d['id'].append(s['id'])
            d['option'].append(o.text)
    df = pd.DataFrame(d)
    

    Output:

    >>> df
         id option
    0   id1     A1
    1   id1     A2
    2   id1     A3
    3   id1     A4
    4   id1     A5
    5   id1     A6
    6   id2     B1
    7   id2     B2
    8   id2     B3
    9   id3     C1
    10  id3     C2
    11  id3     C3
    12  id3     C4