Web elements extraction from websites using Python

I want to extract various elements from tables and paragraph texts from this website.

https://www.instituteforsupplymanagement.org/about/MediaRoom/newsreleasedetail.cfm?ItemNumber=30655

This is the code I am using:

import lxml
from lxml import html
from lxml import etree
import urllib2
source = urllib2.urlopen('https://www.instituteforsupplymanagement.org/about/MediaRoom/newsreleasedetail.cfm?ItemNumber=30656&SSO=1').read()
x = etree.HTML(source)
growth = x.xpath("//*[@id="home_feature_container"]/div/div[2]/div/table[2]/tbody/tr[3]/td[2]/p)")
growth

What is the best way to extract the elements I want from a website without having to change the XPath in the code every time? They publish new data in the same website every month, but the XPath seems to change a little bit sometimes.

Solution

If the position of the items you want changes regularly, try to retrieve them by name. Here is, for example, how to extract the elements from the table in the "New Orders" row.

import requests #better than urllib
from lxml import html, etree

url = 'https://www.instituteforsupplymanagement.org/about/MediaRoom/newsreleasedetail.cfm?ItemNumber=30655&SSO=1'
page = requests.get(url)
tree = html.fromstring(page.content)

neworders = tree.xpath('//strong[text()="New Orders"]/../../following-sibling::td/p/text()')

print(neworders)

Or if you want the whole html table :

data = tree.xpath('//th[text()="MANUFACTURING AT A GLANCE"]/../..')

for elements in data:
    print(etree.tostring(elements, pretty_print=True))

Another example using BeautifulSoup

from bs4  import BeautifulSoup
import requests

url = "https://www.instituteforsupplymanagement.org/about/MediaRoom/newsreleasedetail.cfm?ItemNumber=30655&SSO=1"

content = requests.get(url).content

soup = BeautifulSoup(content, "lxml")

table = soup.find_all('table')[1]

table_body = table.find('tbody')

data= []
rows = table_body.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])

print(data)