Right now I am using this, but it seems to be very slow and also prints the columns as lists. Additionally, here I am manually adding columns to my list. Is there a more efficient way using numpy and reading the columns as arrays?
If not this, I was thinking of converting it to a .txt or .csv as they are easier to read. What would be the most efficient option?
Also, I have the same file in .ods and .xlsx, so using either one is fine.
import xlrd
workbook = xlrd.open_workbook("Folds5x2_pp.xlsx","rb")
sheets = workbook.sheet_names()
print sheets
required_data = []
for sheet_name in sheets:
sh = workbook.sheet_by_name(sheet_name)
for rownum in range(sh.nrows):
row_val = sh.row_values(rownum)
required_data.append((row_val[0], row_val[1]))
print required_data
Try using openpyxl
>>> from openpyxl import load_workbook
>>> wb = load_workbook('Folds5x2_pp.xlsx', read_only=True)
>>> print wb.sheetnames
['Sheet1', 'Sheet2', 'Sheet3']
>>> ws = wb.get_sheet_by_name('Sheet1')
>>> cols = 0 # column index
>>> x2 = np.array([r[cols].value for r in ws.iter_rows()])
or you can try pandas to_records
import pandas as pd;
df = pd.read_excel('Folds5x2_pp.xlsx');
x2 = df.to_records()