Search code examples
pythonexcelpython-requestsxlrd

import excel column in python list


Hi I have an excel sheet with only 1 column and i want to import that column into a list in python. It has 5 elements in that column, all containing a url like "http://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AAACc6yA_4MhwbaxX_dizyg3a/NT51-177/DPS_0321.jpg?dl=0".

My code

import requests
import csv
import xlrd

ls = []
ls1 = ['01.jpg','02.jpg','03.jpg','04.jpg','05.jpg','06.jpg']
wb = xlrd.open_workbook('Book1.xls')
ws = wb.sheet_by_name('Book1')
num_rows = ws.nrows - 1
curr_row = -1
while (curr_row < num_rows):
    curr_row += 1
    row = ws.row(curr_row)
    ls.append(row)

for each in ls:
    urlFetch = requests.get(each)
    img = urlFetch.content
    for x in ls1:
        file = open(x,'wb') 
        file.write(img)
        file.close()

Now it is giving me error:

Traceback (most recent call last):
  File     "C:\Users\Prime\Documents\NetBeansProjects\Python_File_Retrieve\src\python_file_retrieve.py", line   18, in <module>
urlFetch = requests.get(each)
  File "c:\Python34\lib\site-packages\requests-2.5.0-py3.4.egg\requests\api.py", line 65, in get
return request('get', url, **kwargs)
  File "c:\Python34\lib\site-packages\requests-2.5.0-py3.4.egg\requests\api.py", line 49, in request
response = session.request(method=method, url=url, **kwargs)
  File "c:\Python34\lib\site-packages\requests-2.5.0-py3.4.egg\requests\sessions.py", line 461, in request
    resp = self.send(prep, **send_kwargs)
  File "c:\Python34\lib\site-packages\requests-2.5.0-py3.4.egg\requests\sessions.py", line 567, in send
    adapter = self.get_adapter(url=request.url)
  File "c:\Python34\lib\site-packages\requests-2.5.0-py3.4.egg\requests\sessions.py", line 646, in get_adapter
    raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for '[text:'https://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AAACc6yA_4MhwbaxX_dizyg3a/NT51-177/DPS_0321.jpg?dl=0']'

Please Help


Solution

  • Your problem isn't with reading the Excel file, but with parsing the content out of it. Notice that your error was thrown from the Requests library?

    requests.exceptions.InvalidSchema: No connection adapters were found for <url>
    

    From the error we learn that the URL you take from each cell in your Excel file, also has a [text: prefix -

    '[text:'https://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AAACc6yA_4MhwbaxX_dizyg3a/NT51-177/DPS_0321.jpg?dl=0']'
    

    That's something that Requests cannot work with, because it doesn't know the protocol of the URL. If you do

    requests.get('https://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AAACc6yA_4MhwbaxX_dizyg3a/NT51-177/DPS_0321.jpg?dl=0')
    

    You get appropriate results.

    What you need to do is extract the URL only out of the cell. If you're having problems with that, give us examples for URLs in the Excel file