When I open excel file with .xls format here it is opened faster than in case of .xlsx extension here using Pandas. I am using Pandas 1.0.1 and Python 3.7.6. These files are literally the same, I just renamed file names and first sheet name for the sake of convenience. The files consist of 6 sheets with 49 columns and approximately 1700 rows numeric data in each sheet. As you can see I am just reading only the first sheet here, but the same result holds for any number of sheets and rows. (Almost 4x time difference)
Is this the reason? [From https://windowsfileviewer.com]
"While XLS files use a proprietary binary format, XLSX files use a newer file format referred to as Open XML. The XLS extension is used by Microsoft Excel 2003 and earlier and the XLSX extension is used by Microsoft Excel 2007 and later"
I could not find any information on Panda`s official documentation. I am just wondering why and how this happens.
maybe because XLS files use a proprietary binary format whereas, XLSX use a newer file format known as Open XML