Search code examples
pythondjangodjango-import-export

django-import-export empty rows before csv header trigger exception while importing


While importing data from csv, I realized that this error is triggered if the first row is not the header

list indices must be integers or slices, not str


first_name,last_name,email,password,role
Noak,Larrett,nlarrett0@ezinearticles.com,8sh15apPjI,Student
Duffie,Milesap,dmilesap1@wikipedia.org,bKNIlIWVfNw,Student

It only works if the first row is the header

first_name,last_name,email,password,role
Noak,Larrett,nlarrett0@ezinearticles.com,8sh15apPjI,Student
Duffie,Milesap,dmilesap1@wikipedia.org,bKNIlIWVfNw,Student

...

I tried overwriting before_import to remove any blank row

def before_import(self, dataset, using_transactions, dry_run, **kwargs):
    indexes = []
    for i in range(0, len(dataset)):
        row = ''.join(dataset[i])
        if row.strip() == '':
            indexes.append(i)
    for index in sorted(indexes, reverse=True):
        del dataset[index]          
    return dataset

This works for all the rows, except the first row which should always contain the header, and if not the error is thrown.


Solution

  • After hours of debugging, I found the ImportMixin class, which is in import_export/admin.py

    The class contains a method called import_action that looks like this

    def import_action(self, request, *args, **kwargs):
        ...
        import_file = form.cleaned_data['import_file']
        ...
        data = tmp_storage.read(input_format.get_read_mode())
        ...
        dataset = input_format.create_dataset(data)
        ...
    

    As you can see, this is the function that reads the uploaded file to a string before passing it to input_format.create_dataset(). So all I had to do was adding a custom function that removed the blank lines

    data = self.remove_blanks(data)
    dataset = input_format.create_dataset(data)
    

    import_export/admin.py/ImportMixin

    def remove_blanks(self, data):
        return os.linesep.join([s for s in data.splitlines() if s.strip()])
    

    This way any csv file will not have any blank line, which will force the first line to be the header and that solves the problem. I hope this will be useful to anyone facing the same issue.

    UPDATE : There is also an easy way to do the same by overwriting create_dataset in import_export/formats/base_formats.py

    import_export/formats/base_formats.py/TablibFormat

    def create_dataset(self, in_stream, **kwargs):
        in_stream = os.linesep.join([s for s in in_stream.splitlines() if s.strip()])
        try:
            return tablib.import_set(in_stream, format=self.get_title())
        except:
            return tablib.import_set('', format=self.get_title())