Search code examples
urlweb-scrapingseo

URL decoding and understanding


Recently I started learning Web scraping. For this purpose I need to focus on URLs and there basic structures. I considered two URLs from Amazon and Priceline for home work purpose.

The some basic concepts of URL

  • A query string comes at the end of a URL, starting with a single question mark, “?”.
  • Parameters are provided as key-value pairs and separated by an ampersand, “&”.
  • The key and value are separated using an equals sign, “=”
  • most web frameworks will allow us to define “nice looking” URLs that just include the parameters in the path of a URL

Amazon URL

https://www.amazon.com/books-used-books-textbooks/b/?ie=UTF8&node=283155&ref_=nav_cs_books_788dc1d04dfe44a2b3249e7a7c245230

As per my understanding:

Parameters
ie=UTF8
node = 283155
ref_=nav_cs_books_788dc1d04dfe44a2b3249e7a7c245230

Key       Values
ie        UTF8
node      283155
ref_      nav_cs_books_788dc1d04dfe44a2b3249e7a7c245230

Priceline URL

https://www.priceline.com/relax/in/3000005381/from/20210310/to/20210317/rooms/1?vrid=16e829a6d7ee5b5538fe51bb7e6925e8

This url is based on the hotel booking in Chicago from 03/10/2021 to 03/17/2021.

As per my understanding:

key    values
from   20210310  2021 - 03 -10
to     20210317  2021 - 03 -17
rooms  1

I did not find out anything more than that. I just make sure am I missing something? Can those URLS analysis more precisely?


Solution

  • Tips that may help are:

    Data can be posted via GET or POST. What you are describing with URLs is GET. POST is when you don't see anything in the url.

    In both cases getting familiar with using your browser's developer console will help you explore how websites work. In Chrome, you can hit F12 or right click any element and select "inspect element." This is especially helpful when trying to inspect data that is passed using POST, since you can't see them in the url. Use the "network" tab while clicking around to see what the website is doing in the background.

    Lastly, just play around with websites. For example, when you browse Amazon you might notice the urls look like https://www.amazon.com/Avalon-Organics-Creme-Radiant-Renewal/dp/B082G172GL/?_encoding=UTF8 but if you play around with it you notice you can delete out the title and the url still works like this: https://www.amazon.com/dp/B082G172GL