Scrape a table from a website and store as pandas

In Python, I want to scrape the table in a website(it's a Japanese option trading information), and store it as a pandas dataframe.

The website is here, and you need to click "Options Quotes" in order to access the page where I want to scrape the table. The final URL is https://svc.qri.jp/jpx/english/nkopm/ but you cannot directly access this page.

Here is my attempt:

pd.read_html("https://svc.qri.jp/jpx/english/nkopm/")
...HTTPError: HTTP Error 400: Bad Request

So I thought I need to add a user agent. Here is my another attempt:

url = "https://svc.qri.jp/jpx/english/nkopm/"
pd.read_html(requests.get(url, headers={'User-agent': 'Mozilla/5.0'}).text)
...ValueError: No tables found

Another attempt

import urllib
url = 'https://svc.qri.jp/jpx/english/nkopm/'
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
response = opener.open(url)
tables = pd.read_html(response.read(), attrs={"class":"price-table"})[0]
...HTTPError: HTTP Error 400: Bad Request

I know how to play with pandas, so it doesn't have to be imported in a neat dataframe at first place. I just need to first import the table in pandas, but I'm not sure why I cannot even read the page. Any help would be appreciated!

By the way, if you click gray arrows in the middle column, ,

it will add another row like this.

And it can be all opened and closed by clicking these buttons.

It would be nice if I can import these rows as well, but it not really a must.

Solution

Reading the documentation of the pandas function read_html it says

Read HTML tables into a list of DataFrame objects.

So the function expects structured input in form of an html table. I actually can't access the website you're linking to but I'm guessing it will give you back an entire website.

You need to extract the data in a structured format in order for pandas to make sense of it. You need to scrape it. There's a bunch of tools for that, and one popular one is BeautifulSoup.

Tl;dr: So what you need to do is download the website with requests, pass it into BeautifulSoup and then use BeautifulSoup to extract the data in a structured format.

Updated answer:

Seems like the reason why the requests is returning a 400 is because the website is expecting some additional headers - I just dumped the request my browser does into requests and it works!

import requests

headers = {
    'Connection': 'keep-alive',
    'Cache-Control': 'max-age=0',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Sec-Fetch-Site': 'cross-site',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-User': '?1',
    'Sec-Fetch-Dest': 'document',
    'Referer': 'https://www.jpx.co.jp/english/markets/index.html',
    'Accept-Language': 'de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7,it;q=0.6,la;q=0.5',
}

response = requests.get('https://svc.qri.jp/jpx/english/nkopm/', headers=headers, cookies=cookies)