Search code examples
pythonarraysnumpyxlsxods

How to read xlsx or ods columns as numpy arrays in python


Right now I am using this, but it seems to be very slow and also prints the columns as lists. Additionally, here I am manually adding columns to my list. Is there a more efficient way using numpy and reading the columns as arrays?

If not this, I was thinking of converting it to a .txt or .csv as they are easier to read. What would be the most efficient option?

Also, I have the same file in .ods and .xlsx, so using either one is fine.

import xlrd  
workbook = xlrd.open_workbook("Folds5x2_pp.xlsx","rb")
sheets = workbook.sheet_names()
print sheets
required_data = []
for sheet_name in sheets:
    sh = workbook.sheet_by_name(sheet_name)
    for rownum in range(sh.nrows):
        row_val = sh.row_values(rownum)
        required_data.append((row_val[0], row_val[1]))
print required_data

Solution

  • Try using openpyxl

    >>> from openpyxl import load_workbook
    >>> wb = load_workbook('Folds5x2_pp.xlsx', read_only=True)
    >>> print wb.sheetnames
    ['Sheet1', 'Sheet2', 'Sheet3']
    >>> ws = wb.get_sheet_by_name('Sheet1')
    >>> cols = 0  # column index 
    >>> x2 = np.array([r[cols].value for r in ws.iter_rows()])
    

    or you can try pandas to_records

    import pandas as pd; 
    df = pd.read_excel('Folds5x2_pp.xlsx'); 
    x2 = df.to_records()