I am trying to read some data from our internal web-page using the following code:
import requests
from requests_toolbelt.utils import dump
resp = requests.get('XXXXXXXXXXXXXXXX')
data = dump.dump_all(resp)
print(data.decode('utf-8'))
And the output I am getting is in following format:
<tr>
<td bgcolor="#FFFFFF"><font size=2><a
href=javascript:openwin(179)>Kevin</a></font></td>
<td bgcolor="#FFFFFF"><font size=2>45.50/week</font></td>
</tr>
<tr>
<td bgcolor="#FFFFFF"><font size=2><a
href=javascript:openwin(33)>Eliza</a></font></td>
<td bgcolor="#FFFFFF"><font size=2>220=00/week</font></td>
</tr>
<tr>
<td bgcolor="#FFFFFF"><font size=2><a href=javascript:openwin(97)>sam</a></font></td>
<td bgcolor="#FFFFFF"><font size=2>181=00</font></td>
</tr>
However the data I am interested in above output is the name and the values, e.g.:
Kevin 45.50/week
Eliza 220=00/week
Sam 181=00
Is there any module/way I can format this output in required format and put it in some file(preferably Excel)
Try BeautifulSoup:
from bs4 import BeautifulSoup as soup
content = """<tr>
<td bgcolor="#FFFFFF"><font size=2><a
href=javascript:openwin(179)>Kevin</a></font></td>
<td bgcolor="#FFFFFF"><font size=2>45.50/week</font></td>
</tr>
<tr>
<td bgcolor="#FFFFFF"><font size=2><a
href=javascript:openwin(33)>Eliza</a></font></td>
<td bgcolor="#FFFFFF"><font size=2>220=00/week</font></td>
</tr>
<tr>
<td bgcolor="#FFFFFF"><font size=2><a href=javascript:openwin(97)>sam</a></font></td>
<td bgcolor="#FFFFFF"><font size=2>181=00</font></td>
</tr>"""
html = soup(content, 'lxml')
trs = html.find_all('tr')
for row in trs:
tds = row.find_all('td')
for data in tds:
print data.text.strip(),
print '\n'
The output:
Kevin 45.50/week
Eliza 220=00/week
sam 181=00
First find all <tr>
tags with find_all('tr')
, then all <td>
tags inside with find_all('td')
, finally output text content of that td
with data.text