Url
There are 4 jpg files in one book id.
There are 749 - 826 book id
The last urls are
I try use two "for loops and while loops" to get all url , but always failure.
# -*- coding: UTF-8 -*-
base_url = "http://url.com/"
page = "/page-"
jpg = ".jpg"
for bookid in range(749,827):
url = base_url + str(bookid) + page
for n in range(1,5):
u = url + str(n) + jpg
print (u)
The logic is that I wanna get 1 book id and 1-4 jpg, then create a folder by book id. Move 1-4 pages to folder one by one.
import urllib
import os
book_ids = list(range(749 ,827))
page_ids = ["page-1.jpg","page-2.jpg","page-3.jpg","page-4.jpg"]
all_url = []
base_url ="http://url.com/"
for book_id in book_ids:
books =[]
for page_id in page_ids:
books.append(base_url+str(book_id)+"/"+str(page_id))
all_url.append({book_id:books})
for data in all_url:
directory ="new/"+str(data.keys()[0])
if not os.path.exists(directory):
os.makedirs(directory)
count = 0
for urls in data.items()[0][1]:
#print urls
filename = page_ids[count]
#print filename
fullfilename = os.path.join(directory, filename)
urllib.urlretrieve(urls, fullfilename)
count= count+1
now you have all url with corresponding book id now it will download if there is some content hope it helps