Search code examples
javaoracle-databasejsflarge-files

Retrieve large amount of data in Java Web App and download it to client


Given:

  • a web application running on websphere and a oracle database with large tables
  • a jsf website that allows the user to download parts of the data in csv files.
  • Normal way of accessing the DB is via JPA (openJPA)
  • potential file download size is up to 500MB to 1GB

Problem:

How to enable the user to download such large data exports from the DB via the web application on his client hard drive without OutOfMemory Exceptions/storing/buffering the complete data on the server


Solution

  • You're likely going to run into a number of limits trying to do this in a single HTTP request.

    Streaming and asynchronous processing are two possible solutions.

    Streaming

    It looks like JPA 2.2 has added streaming support. To prevent out of memory conditions you'll likely need to tune your JVM, you'll also likely need to adjust the JDBC fetch size to balance your DB performance and your client performance.

    You can then stream/buffer the results back to the client.

    There are problems with this approach though. What happens when there's a momentary loss in the network connection? Someone accidentally closes their browser, providing Content-Length response headers to help gauge time remaining, etc.? A better approach is:

    Asynchronous

    The steps could go something like this:

    1. Client submits a request to the server.
    2. Server kicks off an asynchronous process (web worker or JMS).
    3. The async process generates the file and stores it temporarily on the file system.
    4. When the process completes it sends an email to the client to pick up their file. This is known as a claim check. Instead of an email, it could also be a URL that's returned that the user can then refresh to see when their file is ready for pick up.
    5. After a "reasonable" amount of time the file is deleted.

    This approach can be configured to allow auto-resume capabilities and eliminates network uncertainty and browser tab closes from the equation. Overall, it also alleviate the strain of having to handle re-requests. Using JMS also, gives you the ability to scale this solution horizontally instead of vertically.