Search code examples
pythonrestfastapiweb-development-server

How to make a large file accessible to external APIs?


I'm new to webdev, and I have this use case where a user sends a large file (e.g., a video file) to the API, and then this file needs to be accessible to other APIs (which could possibly be on different servers) for further processing.

I'm using FastAPI for the backend, defining a file parameter with a type of UploadFile to receive and store the files. But what would be the best way to make this file accessible to other APIs? Is there a way I can get a publicly accessible URL out of the saved file, which other APIs can use to download the file?


Solution

  • Returning a File Response

    First, to return a file that is saved on disk from a FastAPI backend, you could use FileResponse (in case the file was already fully loaded into memory, see here). For example:

    from fastapi import FastAPI
    from fastapi.responses import FileResponse
    
    some_file_path = "large-video-file.mp4"
    app = FastAPI()
    
    @app.get("/")
    def main():
        return FileResponse(some_file_path)
    

    In case the file is too large to fit into memory—as you may not have enough memory to handle the file data, e.g., if you have 16GB of RAM, you can't load a 100GB file—you could also use StreamingResponse instead. That way, you don't have to read the entire contents into memory at once, but, instead, read them in chunks, thus processing the data one chunk at a time. It should be noted, though, that FileResponse, as shown in the FileResponse class implementation, reads the file contents into memory in chunks as well; hence, there would be no memory issues with using FileResponse either. However, as can be seen in the implementation, FileResponse uses a pre-defined chunk size of 64KB. Thus, if you wished choosing a larger chunk size, you could instead use StreamingResponse and specify the chunk size as desired, as demonstrated in this answer, or use yield from f, as shown below (but, this might be somewhat slower, always depending on your needs).

    from fastapi import FastAPI
    from fastapi.responses import StreamingResponse
    
    some_file_path = "large-video-file.mp4"
    app = FastAPI()
    
    @app.get("/")
    def main():
        def iterfile():
            with open(some_file_path, mode="rb") as f:
                yield from f
    
        return StreamingResponse(iterfile(), media_type="video/mp4")
    

    Exposing the API to the public

    As for exposing your API to the public—i.e., external APIs, users, developers, etc.—you can use ngrok (or expose, as suggested in this answer).

    Ngrok is a cross-platform application that enables developers to expose a local development server to the Internet with minimal effort. To embed the ngrok agent into your FastAPI application, you could use pyngrok—as suggested here (see here for a FastAPI integration example). If you would like to run and expose your FastAPI app through Google Colab (using ngrok), instead of your local machine, please have a look at this answer (plenty of tutorials/examples can also be found on the web).

    If you are looking for a more permanent solution, you may want to have a look at cloud platforms—more specifically, a Platform as a Service (PaaS). I would strongly recommend you thoroughly read FastAPI's Deployment documentation. Have a closer look at About HTTPS and Deployments Concepts.

    Important Notes

    By exposing your API to the outside world, you are also exposing it to various forms of attack. Before exposing your API to the public—even if it’s for free—you need to make sure you are offering secure access (use HTTPS), as well as authentication (verify the identity of a user) and authorization (verify their access rights; in other words, verify what specific routes, files and data a user has access to)—take a look at 1. OAuth2 and JWT tokens, 2. OAuth2 scopes, 3. Role-Based Access Control (RBAC), 4. Get Current User and How to Implement Role based Access Control With FastAPI.

    Addtionally, if you are exposing your API to be used publicly, you may want to limit the usage of the API because of expensive computation, limited resources, DDoS attacks, Brute-force attacks, Web scraping, or simply due to monthly cost for a fixed amount of requests. You can do that at the application level using, for instance, slowapi (related post could be found here), or at the platform level by setting the rate limit through your hosting service (if permitted). Furthermore, you would need to make sure that the files uploaded by users have the permitted file extension, e.g., .mp4, and are not files with, for instance, a .exe extension that are potentially harmful to your system. Also, you should rather give your own filenames and keep track of them (associate them with user accounts), when saving the user's uploaded files to disk, by generating unique random UUIDs, for instance. In this way, you would avoid filename clashes, as well as ensure that unauthorized users could not have access to whatever file(name) they may request—it might also be a good idea to create a different disk space/directory for each user, where their files will be stored and separate from others.

    Finally, you would also need to ensure that the uploaded files do not exceed a predefined MAX_FILE_SIZE limit (based on your needs and system's resources), so that authenticated users, or an attacker, would be prevented from uploading extremely large files that would result in consuming server resources in a way that the application may end up crashing. You shouldn't rely, though, on the Content-Length header being present in the request to do that, as this might be easily altered, or even removed, by the client. You should rather use an approach similar to this answer (have a look at the "Update" section) that uses request.stream() to process the incoming data in chunks as they arrive, instead of loading the entire file into memory first. By using a simple counter, e.g., total_len += len(chunk), you can check if the file size has exceeded the MAX_FILE_SIZE, and if so, raise an HTTPException with HTTP_413_REQUEST_ENTITY_TOO_LARGE status code (see this answer as well, for more details and code examples).

    Read more on FastAPI's Security documentation and API Security on Cloudflare.