Search code examples
pythonpython-3.xexcelpython-requestsxlsx

How to download first line of xlsx file via url python


I used to use requests lib to load single line via url:

import requests

def get_line(url):
    resp = requests.get(url, stream=True)
    for line in resp.iter_lines(decode_unicode=True):
        yield line

line = get_line(url)
print(next(line))

A text files loading perfectly. But if I want to load .xlsx, result looks like unprintable symbols:

PK [symbols] [Content_Types].xml [symbols]

Is there a way to load single row of cells?


Solution

  • You can't just read raw HTTP response and seek for the particular Excel data. In order to get xlsx file contents in proper format you need to use an appropriate library.

    One of the common libraries is xlrd, you can install it with pip:

    sudo pip3 install xlrd
    

    Example:

    import requests
    import xlrd
    
    example_url = 'http://www.excel-easy.com/examples/excel-files/fibonacci-sequence.xlsx'
    r = requests.get(example_url)  # make an HTTP request
    
    workbook = xlrd.open_workbook(file_contents=r.content)  # open workbook
    worksheet = workbook.sheet_by_index(0)  # get first sheet
    first_row = worksheet.row(0)  # you can iterate over rows of a worksheet as well
    
    print(first_row)  # list of cells
    

    xlrd documentation


    If you want to be able to read your data line by line - you should switch to more simple data representation format, like .csv or simple text files.