I am very new to Python [running 2.7.x] and I am trying to download content from a webpage with thousands of links. Here's my code:
import urllib2
i = 1
limit = 1441
for i in limit:
url = 'http://pmindia.gov.in/content_print.php?nodeid='+i+'&nodetype=2'
response = urllib2.urlopen(url)
webContent = response.read()
f = open('speech'+i+'.html', 'w')
f.write(webContent)
f.close
Fairly elementary, but I get one or both of these errors 'int object is not iterable' or 'cannot concatenate str and int'. These are the printable versions of the links on this page: http://pmindia.gov.in/all-speeches.php (1400 links). But the node id's go from 1 to 1441 which means 41 numbers are missing (which is a separate problem). Final final question: in the long run, while downloading thousands of link objects, is there a way to run them in parallel to increase processing speed?
There are a couple of mistakes in your code.
With those fixes your code look like
import urllib2
i = 1
limit = 1441
for i in xrange(1,limit+1):
url = 'http://pmindia.gov.in/content_print.php?nodeid='+repr(i)+'&nodetype=2'
response = urllib2.urlopen(url)
webContent = response.read()
f = open('speech'+repr(i)+'.html', 'w')
f.write(webContent)
f.close
Now, if you want to go into web scraping for real, I suggest you have a look at some packages such as lxml and requests