Recently I started learning Web scraping. For this purpose I need to focus on URLs and there basic structures. I considered two URLs from Amazon and Priceline for home work purpose.
The some basic concepts of URL
Amazon URL
https://www.amazon.com/books-used-books-textbooks/b/?ie=UTF8&node=283155&ref_=nav_cs_books_788dc1d04dfe44a2b3249e7a7c245230
As per my understanding:
Parameters
ie=UTF8
node = 283155
ref_=nav_cs_books_788dc1d04dfe44a2b3249e7a7c245230
Key Values
ie UTF8
node 283155
ref_ nav_cs_books_788dc1d04dfe44a2b3249e7a7c245230
Priceline URL
https://www.priceline.com/relax/in/3000005381/from/20210310/to/20210317/rooms/1?vrid=16e829a6d7ee5b5538fe51bb7e6925e8
This url is based on the hotel booking in Chicago from 03/10/2021 to 03/17/2021.
As per my understanding:
key values
from 20210310 2021 - 03 -10
to 20210317 2021 - 03 -17
rooms 1
I did not find out anything more than that. I just make sure am I missing something? Can those URLS analysis more precisely?
Tips that may help are:
Data can be posted via GET or POST. What you are describing with URLs is GET. POST is when you don't see anything in the url.
In both cases getting familiar with using your browser's developer console will help you explore how websites work. In Chrome, you can hit F12 or right click any element and select "inspect element." This is especially helpful when trying to inspect data that is passed using POST, since you can't see them in the url. Use the "network" tab while clicking around to see what the website is doing in the background.
Lastly, just play around with websites. For example, when you browse Amazon you might notice the urls look like https://www.amazon.com/Avalon-Organics-Creme-Radiant-Renewal/dp/B082G172GL/?_encoding=UTF8 but if you play around with it you notice you can delete out the title and the url still works like this: https://www.amazon.com/dp/B082G172GL