I would like to upload very large files (up to 5 or 6 GB) to a web server using the HttpClient
class (4.1.2) from Apache. Before sending these files, I break them into smaller chunks (100 MB, for example). Unfortunately, all of the examples I see for doing a multi-part POST using HttpClient
appear to buffer the file contents before sending them (typically, a small file size is assumed). Here is such an example:
HttpClient httpclient = new DefaultHttpClient();
HttpPost post = new HttpPost("http://www.example.com/upload.php");
MultipartEntity mpe = new MultipartEntity();
// Here are some plain-text fields as a part of our multi-part upload
mpe.addPart("chunkIndex", new StringBody(Integer.toString(chunkIndex)));
mpe.addPart("fileName", new StringBody(somefile.getName()));
// Now for a file to include; looks like we're including the whole thing!
FileBody bin = new FileBody(new File("/path/to/myfile.bin"));
mpe.addPart("myFile", bin);
post.setEntity(mpe);
HttpResponse response = httpclient.execute(post);
In this example, it looks like we create a new FileBody
object and add it to the MultipartEntity
. In my case, where the file could be 100 MB in size, I'd rather not buffer all of that data at once. I'd like to be able to write out that data in smaller chunks (4 MB at a time, for example), eventually writing all 100 MB. I'm able to do this using the HTTPURLConnection
class from Java (by writing directly to the output stream), but that class has its own set of problems, which is why I'm trying to use the Apache offerings.
Is it possible to write 100 MB of data to an HttpClient, but in smaller, iterative chunks? I don't want the client to have to buffer up to 100 MB of data before actually doing the POST. None of the examples I see seem to allow you to write directly to the output stream; they all appear to pre-package things before the execute()
call.
Any tips would be appreciated!
For clarification, here's what I did previously with the HTTPURLConnection
class. I'm trying to figure out how to do something similar in HttpClient
:
// Get the connection's output stream
out = new DataOutputStream(conn.getOutputStream());
// Write some plain-text multi-part data
out.writeBytes(fieldBuffer.toString());
// Figure out how many loops we'll need to write the 100 MB chunk
int bufferLoops = (dataLength + (bufferSize - 1)) / bufferSize;
// Open the local file (~5 GB in size) to read the data chunk (100 MB)
raf = new RandomAccessFile(file, "r");
raf.seek(startingOffset); // Position the pointer to the beginning of the chunk
// Keep track of how many bytes we have left to read for this chunk
int bytesLeftToRead = dataLength;
// Write the file data block to the output stream
for(int i=0; i<bufferLoops; i++)
{
// Create an appropriately sized mini-buffer (max 4 MB) for the pieces
// of this chunk we have yet to read
byte[] buffer = (bytesLeftToRead < bufferSize) ?
new byte[bytesLeftToRead] : new byte[bufferSize];
int bytes_read = raf.read(buffer); // Read ~4 MB from the local file
out.write(buffer, 0, bytes_read); // Write that bit to the stream
bytesLeftToRead -= bytes_read;
}
// Write the final boundary
out.writeBytes(finalBoundary);
out.flush();
If I'm understanding your question correctly, your concern is loading the whole file into memory (right?). If That is the case, you should employ Streams (such as a FileInputStream). That way, the whole file doesn't get pulled into memory at once.
If that doesn't help, and you still want to divide the file up into chunks, you could code the server to deal with multiple POSTS, concatenating the data as it gets them, and then manually split up the bytes of the file.
Personally, I prefer my first answer, but either way (or neither way if these don't help), Good luck!