Search code examples
pythondjangoweb-applicationsantivirusantivirus-integration

Setting up a file upload stream scan using Clamav in a Django back-end


Working on a React/Django app. I have files being uploaded by users through the React front-end that end up in the Django/DRF back-end. We have antivirus (AV) running on the server constantly, but we want to add stream scanning before it is written to disk.

It is a bit over my head as how to set it up. Here are a few sources I am looking at.

How do you virus scan a file being uploaded to your java webapp as it streams?

Although accepted best answer describes it being "... quite easy" to setup, I'm struggling.

I apparently need to cat testfile | clamscan - per the post and the corresponding documentation:

How do you virus scan a file being uploaded to your java webapp as it streams?

So if my back-end looks like the following:

class SaveDocumentAPIView(APIView):
    permission_classes = [IsAuthenticated]

    def post(self, request, *args, **kwargs):

        # this is for handling the files we do want
        # it writes the files to disk and writes them to the database
        for f in request.FILES.getlist('file'):
            max_id = Uploads.objects.all().aggregate(Max('id'))
            if max_id['id__max'] == None:
                max_id = 1
            else:    
                max_id = max_id['id__max'] + 1
            data = {
                'user_id': request.user.id,
                'sur_id': kwargs.get('sur_id'),
                'co': User.objects.get(id=request.user.id).co,
                'date_uploaded': datetime.datetime.now(),
                'size': f.size
            }
            filename = str(data['co']) + '_' + \
                    str(data['sur_id']) + '_' + \
                    str(max_id) + '_' + \
                    f.name
            data['doc_path'] = filename
            self.save_file(f, filename)
            serializer = SaveDocumentSerializer(data=data)
            if serializer.is_valid(raise_exception=True):
                serializer.save()
        return Response(status=HTTP_200_OK)

    # Handling the document
    def save_file(self, file, filename):
        with open('fileupload/' + filename, 'wb+') as destination:
            for chunk in file.chunks():
                destination.write(chunk)

I think I need to add something to the save_file method like:

for chunk in file.chunks():
    # run bash comman from python
    cat chunk | clamscan -
    if passes_clamscan:
        destination.write(chunk)
        return HttpResponse('It passed')
    else:
        return HttpResponse('Virus detected')

So my issues are:

1) How to run the Bash from Python?

2) How to receive a result response from the scan so that it can be sent back to the user and other things can be done with the response on the back-end? (Like creating logic to send the user and the admin an email that their file had a virus).

I have been toying with this, but not much luck.

Running Bash commands in Python

Furthermore, there are Github repos out there that claim to marry Clamav with Django pretty well, but they either haven't been updated in years or the existing documentation is pretty bad. See the following:

https://github.com/vstoykov/django-clamd

https://github.com/musashiXXX/django-clamav-upload

https://github.com/QueraTeam/django-clamav


Solution

  • Ok, got this working with clamd. I modified my SaveDocumentAPIView to the following. This scans the files before they are written to disk and prevents them from being written if they infected. Still allows uninfected files through, so the user doesn't have to re-upload them.

    class SaveDocumentAPIView(APIView):
        permission_classes = [IsAuthenticated]
    
        def post(self, request, *args, **kwargs):
    
            # create array for files if infected
            infected_files = []
    
            # setup unix socket to scan stream
            cd = clamd.ClamdUnixSocket()
    
            # this is for handling the files we do want
            # it writes the files to disk and writes them to the database
            for f in request.FILES.getlist('file'):
                # scan stream
                scan_results = cd.instream(f)
    
                if (scan_results['stream'][0] == 'OK'):    
                    # start to create the file name
                    max_id = Uploads.objects.all().aggregate(Max('id'))
                    if max_id['id__max'] == None:
                        max_id = 1
                    else:    
                        max_id = max_id['id__max'] + 1
                    data = {
                        'user_id': request.user.id,
                        'sur_id': kwargs.get('sur_id'),
                        'co': User.objects.get(id=request.user.id).co,
                        'date_uploaded': datetime.datetime.now(),
                        'size': f.size
                    }
                    filename = str(data['co']) + '_' + \
                            str(data['sur_id']) + '_' + \
                            str(max_id) + '_' + \
                            f.name
                    data['doc_path'] = filename
                    self.save_file(f, filename)
                    serializer = SaveDocumentSerializer(data=data)
                    if serializer.is_valid(raise_exception=True):
                        serializer.save()
    
                elif (scan_results['stream'][0] == 'FOUND'):
                    send_mail(
                        'Virus Found in Submitted File',
                        'The user %s %s with email %s has submitted the following file ' \
                        'flagged as containing a virus: \n\n %s' % \
                        (
                            user_obj.first_name, 
                            user_obj.last_name, 
                            user_obj.email, 
                            f.name
                        ),
                        'The Company <[email protected]>',
                        ['[email protected]']
                    )
                    infected_files.append(f.name)
    
            return Response({'filename': infected_files}, status=HTTP_200_OK)
    
        # Handling the document
        def save_file(self, file, filename):
            with open('fileupload/' + filename, 'wb+') as destination:
                for chunk in file.chunks():
                    destination.write(chunk)