I have the following html code which I have extracted:
<select class="class1", id="id1">
<option value="0">A1</option>
<option value="1">A2</option>
<option value="2">A3</option>
<option value="3">A4</option>
<option value="4">A5</option>
<option value="5">A6</option>
</select>
.
.
.
<select class="class2", id="id2">
<option value="0">B1</option>
<option value="1">B2</option>
<option value="2">B3</option>
</select>
.
.
<select class="class3", id="id3">
<option value="0">C1</option>
<option value="1">C2</option>
<option value="2">C3</option>
<option value="2">C4</option>
</select>
I need to extract the options and the corresponding ids of each select and arrange them into a Pandas dataframe, similar to this:
id | option |
---|---|
id1 | A1 |
id1 | A2 |
id1 | A3 |
id2 | B1 |
id2 | B2 |
id2 | B3 |
id3 | C1 |
id3 | C2 |
id3 | C3 |
id3 | C4 |
I recommend using BeautifulSoup
for this.
from bs4 import BeautifulSoup
parser = BeautifulSoup(s)
d = {'id': [],'option': []}
for s in parser.find_all('select'):
for o in s.find_all('option'):
d['id'].append(s['id'])
d['option'].append(o.text)
df = pd.DataFrame(d)
Output:
>>> df
id option
0 id1 A1
1 id1 A2
2 id1 A3
3 id1 A4
4 id1 A5
5 id1 A6
6 id2 B1
7 id2 B2
8 id2 B3
9 id3 C1
10 id3 C2
11 id3 C3
12 id3 C4