I am now using PyExcelerator for reading excel files, but it is extremely slow. As I always need to open excel files more than 100MB, it takes me more than twenty minutes to only load one file.
The functionality I need are:
And the code I am using now is:
book = pyExcelerator.parse_xls(filepath)
parsed_dictionary = defaultdict(lambda: '', book[0][1])
number_of_columns = 44
result_list = []
number_of_rows = 500000
for i in range(0, number_of_rows):
ok = False
result_list.append([])
for h in range(0, number_of_columns):
item = parsed_dictionary[i,h]
if type(item) is StringType or type(item) is UnicodeType:
item = item.replace("\t","").strip()
result_list[i].append(item)
if item != '':
ok = True
if not ok:
break
Any suggestions?
pyExcelerator appears not to be maintained. To write xls files, use xlwt, which is a fork of pyExcelerator with bug fixes and many enhancements. The (very basic) xls reading capability of pyExcelerator was eradicated from xlwt. To read xls files, use xlrd.
If it's taking 20 minutes to load a 100MB xls file, you must be using one or more of: a slow computer, a computer with very little available memory, or an older version of Python.
Neither pyExcelerator nor xlrd read password-protected files.
Here's a link that covers xlrd and xlwt.
Disclaimer: I'm the author of xlrd and maintainer of xlwt.