I'll assume you have seen the movie "The Social Network" for this question.
I'd like to know if it is possible to download images from websites like Zuckerberg does at the beginning while he's working on Facemash.com; and if it is possible, how would you go about doing such a thing?
Feel free to be technical about it if you have the knowledge; this is something I've been intrigued about for a while now and I'd love to know.
Thanks!
(so pretty much; downloading images & files from a website's directory without knowing exactly the names of said files)
The general technique of grabbing data from the web is called "scraping". To download images you would grab the source of the page, search through it for any any <img>
tags and make a additional requests for the address pointed to by the src
attribute. Then you would build a list of additional links in the page to follow and repeat the process.
For instance on this page there are only two tags. One of them is your avator and it looks like this:
<img src="https://i.sstatic.net/mWxgi.png?s=32&g=1" alt="">
From a Linux shell I can grab the image with wget by doing:
wget "https://i.sstatic.net/mWxgi.png?s=32&g=1"
How you grab the page source varies. In Python I might use the requests and beautiful soup libraries to grab and process the page source. If the page was largely generated via Javascript I might have to use Selenium Webdriver to actually drive a real browser session.