Search code examples
javaperformancefilememoryio

What's the most efficient way to read in a massive log file, and post to an API endpoint in Java?


Currently I have a massive log file in my application that I need to post to an endpoint. I periodically run a method that will read in the entire file into a list, perform some formatting so that the endpoint will accept it, and then convert the string using StringBuilder, return this string, and then post it to my endpoint. Oh, I forgot to mention, I batch the data out in chunks of X many characters. I am seeing some memory issues in my application and am trying to deal with this.

So this is how I am partitioning out the data to a temporary list

 if (logFile.exists()) {
            try (BufferedReader br = new BufferedReader(new FileReader(logFile.getPath()))) {
                String line;
                while ((line = br.readLine()) != null) {
                    if (isJSONValid(line)) {
                        temp.add(line);
                        tempCharCount += line.length();
                    }
                    if (tempCharCount >= LOG_PARTITION_CHAR_COUNT) {
                        // Formatting for the backend
                        String tempString = postFormat(temp);

                        // Send
                        sendLogs(tempString);

                        // Refresh
                        temp = new ArrayList<>();
                        tempCharCount = 0;
                    }
                }

                // Send "dangling" data
                // Formatting for the backend
                String tempString = postFormat(temp);

                // Send
                sendLogs(tempString);
            } catch (FileNotFoundException e) {
                Timber.e(new Exception(e));
            } catch (IOException e) {
                Timber.e(new Exception(e));
            }

So when we reach our partition limit for character count, you can see that we are running

String tempString = postFormat(temp);

This is where we make sure our data is formatted into a string of json data that the endpoint will accept.

private String postFormat(ArrayList<String> list) {
            list.add(0, LOG_ARRAY_START);
            list.add(LOG_ARRAY_END);

            StringBuilder sb = new StringBuilder();
            for (int stringCount = 0; stringCount < list.size(); stringCount++) {
                sb.append(list.get(stringCount));

                // Only add comma separators after initial element, but never add to final element and
                // its preceding element to match the expected backend input
                if (stringCount > 0 && stringCount < list.size() - 2) {
                    sb.append(",");
                }
            }

            return sb.toString();
    }

As you might imagine, if you have a large log file, and these requests are going out async, then we will be using a lot of memory. Once our Stringbuilder is done, we return as a string that will eventually be gzip compressed and posted to an endpoint.

I am looking for ways to decrease the memory usage of this. I profiled it a bit on the side and could see how obviously inefficient it is, but am not sure of how I can do this better. Any ideas are appreciated.


Solution

  • I have one suggestion for you.

    Formatted Output in Temp File - You can write formatted output in temp file. Once the transformation completed then you can read temp files and send to endpoint. If you don’t have sequence concern then you can use multi thread to append same file. With this approach you are not storming any data in memory while transformation which will save lot of memory.