Search code examples
amazon-web-servicesamazon-ec2python-3.6openpyxldjango-2.1

openpyxl "BadZipFile" on Amazon Linux (RedHat/CentOS) EC2


I have to read a .xlsx file, fill some cells and save the edited copy, so I did the following:

  • Saved the file on my django project root folder from S3 with wget:
$ sudo wget https://s3.console.aws.amazon.com/s3/buckets/path/to/file.xlsx
  • Read, fill and save:
import openpyxl

wb = openpyxl.load_workbook('/path/to/file.xlsx', keep_vba = True)

sheet = wb['Sheet1']

sheet['A1'] = "My Text"

wb.save('/path/to/result.xlsx')

When I run on my local machine on Windows 10 it works fine. The case is that I have to deploy it to my EC2 instance which is an Amazon Linux (RedHat/CentOS) machine. So, when I run on EC2 I got the following traceback:

File "/path/to/venv/lib64/python3.6/dist-packages/django/core/handlers/exception.py" in inner
  34.             response = get_response(request)

File "/path/to/venv/lib64/python3.6/dist-packages/django/core/handlers/base.py" in _get_response
  111.         resolver_match = resolver.resolve(request.path_info)

File "/path/to/venv/lib64/python3.6/dist-packages/django/urls/resolvers.py" in resolve
  491.             for pattern in self.url_patterns:

File "/path/to/venv/lib64/python3.6/dist-packages/django/utils/functional.py" in __get__
  37.         res = instance.__dict__[self.name] = self.func(instance)

File "/path/to/venv/lib64/python3.6/dist-packages/django/urls/resolvers.py" in url_patterns
  533.         patterns = getattr(self.urlconf_module, "urlpatterns", self.urlconf_module)

File "/path/to/venv/lib64/python3.6/dist-packages/django/utils/functional.py" in __get__
  37.         res = instance.__dict__[self.name] = self.func(instance)

File "/path/to/venv/lib64/python3.6/dist-packages/django/urls/resolvers.py" in urlconf_module
  526.             return import_module(self.urlconf_name)

File "/usr/lib64/python3.6/importlib/__init__.py" in import_module
  126.     return _bootstrap._gcd_import(name[level:], package, level)

File "<frozen importlib._bootstrap>" in _gcd_import
  994. <source code not available>

File "<frozen importlib._bootstrap>" in _find_and_load
  971. <source code not available>

File "<frozen importlib._bootstrap>" in _find_and_load_unlocked
  955. <source code not available>

File "<frozen importlib._bootstrap>" in _load_unlocked
  665. <source code not available>

File "<frozen importlib._bootstrap_external>" in exec_module
  678. <source code not available>

File "<frozen importlib._bootstrap>" in _call_with_frames_removed
  219. <source code not available>

File "/path/to/project/mysite/urls.py" in <module>
  41.                   url(r'^product/app/', include('app.urls')),

File "/path/to/venv/lib64/python3.6/dist-packages/django/urls/conf.py" in include
  34.         urlconf_module = import_module(urlconf_module)

File "/usr/lib64/python3.6/importlib/__init__.py" in import_module
  126.     return _bootstrap._gcd_import(name[level:], package, level)

File "<frozen importlib._bootstrap>" in _gcd_import
  994. <source code not available>

File "<frozen importlib._bootstrap>" in _find_and_load
  971. <source code not available>

File "<frozen importlib._bootstrap>" in _find_and_load_unlocked
  955. <source code not available>

File "<frozen importlib._bootstrap>" in _load_unlocked
  665. <source code not available>

File "<frozen importlib._bootstrap_external>" in exec_module
  678. <source code not available>

File "<frozen importlib._bootstrap>" in _call_with_frames_removed
  219. <source code not available>

File "/path/to/project/app/urls.py" in <module>
  1. from . import views

File "/path/to/project/app/views.py" in <module>
  39. from .readexcel_function import *

File "/path/to/project/app/readexcel_function .py" in <module>
  10. wb = openpyxl.load_workbook('/path/to/file.xlsx', keep_vba = True)

File "/path/to/venv/lib64/python3.6/dist-packages/openpyxl/reader/excel.py" in load_workbook
  313.                         data_only, keep_links)

File "/path/to/venv/lib64/python3.6/dist-packages/openpyxl/reader/excel.py" in __init__
  124.         self.archive = _validate_archive(fn)

File "/path/to/venv/lib64/python3.6/dist-packages/openpyxl/reader/excel.py" in _validate_archive
  96.     archive = ZipFile(filename, 'r')

File "/usr/lib64/python3.6/zipfile.py" in __init__
  1131.                 self._RealGetContents()

File "/usr/lib64/python3.6/zipfile.py" in _RealGetContents
  1198.             raise BadZipFile("File is not a zip file")

Exception Type: BadZipFile at /
Exception Value: File is not a zip file

I've tried to run on Ubuntu and got the same exception.

I've also tried:

  • upgrade openpyxl;
  • read from a zip with zipfile lib;

Same message.

Is it doable on Linux or I will have to deploy my project on a Windows Server EC2?


Solution

  • After spent hours doing some digging the conclusion is that, if the OS doesn't support excel, there is no way to make it work. I had to deploy it on a Windows Server EC2 intance after all.

    EDIT

    I thought it could be possible only manipulate the ".xlsx" file even without having excel for Ubuntu/Amazon Linux.

    END EDIT

    Maybe someday we'll have a version and then update openpyxl for it.