Search code examples
pythonweb-scrapingbeautifulsouppython-requestsurllib3

Pull CME price data into Python 3.6.8


I am relatively new to Python so I apologize if this is a 'bush league' question.

I am trying to retrieve the WTI futures prices from this website: https://www.cmegroup.com/trading/energy/crude-oil/west-texas-intermediate-wti-crude-oil-calendar-swap-futures_quotes_globex.html

Which libraries should I be using? How will I need to adjust the output when it is pulled from the website?

Currently operating in Python 3.6.8 with the pandas, numpy, requests, urllib3, BeautifulSoup, and json libraries. I am not exactly sure if these are the correct libraries and if they are which functions I should be using.

Here is a basic version of the code:

wtiFutC = 'https://www.cmegroup.com/trading/energy/crude-oil/west-texas-intermediate-wti-crude-oil-calendar-swap-futures_quotes_globex.html'
http = urllib3.PoolManager()
response2 = http.request('GET', wtiFutC)
print(type(response2.data)) #check the type of the data produced - bytes
print(response2.data) #prints out the data

soup2 = BeautifulSoup(response2.data.decode('utf-8'), features='html.parser')
print(type(soup2)) #check the type of the data produced - 'bs4.BeautifulSoup'
print(soup2) #prints out the BeautifulSoup version of the data

I want a way to see the 'Last' price for the WTI future for the whole curve. Instead I am seeing something like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<!--[if (gt IE 9) |!(IE)]><!-->
<html class="cmePineapple no-js" lang="en" xml:lang="en" 
xmlns="http://www.w3.org/1999/xhtml">
<!--<![endif]-->

Any help or direction would be greatly appreciated. Thank you so much! :)


Solution

  • Use the endpoint the page does and parse out the column of interest (and date) from the json

    import requests
    
    r = requests.get('https://www.cmegroup.com/CmeWS/mvc/Quotes/Future/4707/G?quoteCodes=null&_=1560171518204').json()
    last_quotes = [(item['expirationDate'], item['last']) for item in r['quotes']]