Search code examples
python-2.7httphttpwebrequest

Formating http get request output in python


I am trying to read some data from our internal web-page using the following code:

import requests
from requests_toolbelt.utils import dump

resp = requests.get('XXXXXXXXXXXXXXXX')
data = dump.dump_all(resp)
print(data.decode('utf-8'))

And the output I am getting is in following format:

<tr> 
    <td bgcolor="#FFFFFF"><font size=2><a     
href=javascript:openwin(179)>Kevin</a></font></td>
    <td bgcolor="#FFFFFF"><font size=2>45.50/week</font></td>
  </tr>

  <tr> 
    <td bgcolor="#FFFFFF"><font size=2><a  
href=javascript:openwin(33)>Eliza</a></font></td>
    <td bgcolor="#FFFFFF"><font size=2>220=00/week</font></td>
  </tr>

  <tr> 
    <td bgcolor="#FFFFFF"><font size=2><a href=javascript:openwin(97)>sam</a></font></td>
    <td bgcolor="#FFFFFF"><font size=2>181=00</font></td>
  </tr>

However the data I am interested in above output is the name and the values, e.g.:

Kevin 45.50/week
Eliza 220=00/week
Sam 181=00

Is there any module/way I can format this output in required format and put it in some file(preferably Excel)


Solution

  • Try BeautifulSoup:

    from bs4 import BeautifulSoup as soup
    
    content = """<tr> 
        <td bgcolor="#FFFFFF"><font size=2><a     
    href=javascript:openwin(179)>Kevin</a></font></td>
        <td bgcolor="#FFFFFF"><font size=2>45.50/week</font></td>
      </tr>
    
      <tr> 
        <td bgcolor="#FFFFFF"><font size=2><a  
    href=javascript:openwin(33)>Eliza</a></font></td>
        <td bgcolor="#FFFFFF"><font size=2>220=00/week</font></td>
      </tr>
    
      <tr> 
        <td bgcolor="#FFFFFF"><font size=2><a href=javascript:openwin(97)>sam</a></font></td>
        <td bgcolor="#FFFFFF"><font size=2>181=00</font></td>
      </tr>"""
    
    html = soup(content, 'lxml')
    trs = html.find_all('tr')
    
    for row in trs:
        tds = row.find_all('td')
    
        for data in tds:
            print data.text.strip(), 
    
        print '\n'
    

    The output:

    Kevin 45.50/week 
    
    Eliza 220=00/week 
    
    sam 181=00 
    

    First find all <tr> tags with find_all('tr'), then all <td> tags inside with find_all('td'), finally output text content of that td with data.text