Search code examples
mysqlnode.jsmemory-leaksnode-mysql

nodejs mysql memory leak on large volume of high frequency queries


TL;DR

https://gist.github.com/anonymous/1e0faaa99b8bb4afe3749ff94e52c84f - Demonstrates memory consumption implying a leak. Is this in my code or in the mysql package?

Full version :

I am seeing lot of memory leak (and eventual crash every few hours) in my program. This program receives a lot of data over an UDP socket, stores the relevant bits in an in-memory Hash and writes the data in his Hash to a Mysql DB (AWS RDS instance) once every 10 seconds. node.js version is 6.9.4, running on RHEL 7.1

I tried to do some profiling using the "--inspect" option and Chrome devtools and my initial suspicion is the mysql package. To that end I made a simple node.js program which just makes a lot of queries to a local DB and observed that it indeed consumes a lot of memory very fast. Here is the program : https://gist.github.com/anonymous/1e0faaa99b8bb4afe3749ff94e52c84f

I tried a few variations of the program and all of them consumed memory at a rate which was clearly pointing towards an Out of Memory crash. The variations are :

  1. Use a single connection
  2. Use a pool with 30 connections (This is the production setup)
  3. Use a valid query statement
  4. Use an invalid query statement that results in a parse error (The space before the string 123 on line 27 makes it a bad query. Removing the space makes is a valid query)

This above program has nothing like the in-memory DB. It does one thing only : Issue a lot of UPDATE queries to the Mysql DB at a high frequency.

I have set the frequency to 2 seconds to demonstrate the memory consumption easily. Decreasing this frequency will slow down the memory consumption but it will nevertheless be growing. It only delays the crash, but the crash itself is inevitable.

The real usage requirement for frequency is 10 seconds and number of update queries during each run will go up to 10,000. So the numbers in the sample program are pretty close to the real world scenario and are not just some hypothetical simulation numbers.

Here is the output of "--trace-gc" which shows that the memory consumption rises to 400MB within a minute's time :

[29766:0x36c5120]    52326 ms: Scavenge 324.9 (365.1) -> 314.7 (369.1) MB, 8.3 / 0.0 ms [allocation failure].
[29766:0x36c5120]    53292 ms: Scavenge 330.3 (370.1) -> 317.4 (372.1) MB, 3.3 / 0.0 ms [allocation failure].
[29766:0x36c5120]    53477 ms: Scavenge 333.4 (374.1) -> 329.0 (375.1) MB, 15.6 / 0.0 ms [allocation failure].
[29766:0x36c5120]    53765 ms: Scavenge 335.5 (375.1) -> 331.9 (385.1) MB, 20.8 / 0.0 ms [allocation failure].
[29766:0x36c5120]    54701 ms: Scavenge 346.4 (386.1) -> 334.4 (388.1) MB, 5.3 / 0.0 ms [allocation failure].
[29766:0x36c5120]    55519 ms: Scavenge 349.9 (389.1) -> 338.9 (390.1) MB, 5.7 / 0.0 ms [allocation failure].
[29766:0x36c5120]    55614 ms: Scavenge 353.1 (392.1) -> 350.0 (395.1) MB, 17.8 / 0.0 ms [allocation failure].
[29766:0x36c5120]    56081 ms: Scavenge 356.8 (395.1) -> 351.3 (405.1) MB, 18.5 / 0.0 ms [allocation failure].
[29766:0x36c5120]    57195 ms: Scavenge 367.5 (406.1) -> 354.2 (407.1) MB, 3.2 / 0.0 ms (+ 20.1 ms in 236 steps since last GC) [allocation failure].
[29766:0x36c5120]    57704 ms: Scavenge 369.9 (408.1) -> 362.8 (410.1) MB, 12.5 / 0.0 ms (+ 15.7 ms in 226 steps since last GC) [allocation failure].
[29766:0x36c5120]    57940 ms: Scavenge 372.6 (412.1) -> 369.2 (419.1) MB, 21.6 / 0.0 ms (+ 8.5 ms in 139 steps since last GC) [allocation failure].
[29766:0x36c5120]    58779 ms: Scavenge 380.4 (419.1) -> 371.1 (424.1) MB, 11.4 / 0.0 ms (+ 11.3 ms in 165 steps since last GC) [allocation failure].
[29766:0x36c5120]    59795 ms: Scavenge 387.0 (426.1) -> 375.3 (427.1) MB, 10.6 / 0.0 ms (+ 14.4 ms in 232 steps since last GC) [allocation failure].
[29766:0x36c5120]    59963 ms: Scavenge 392.0 (431.3) -> 388.8 (434.3) MB, 19.1 / 0.0 ms (+ 50.5 ms in 207 steps since last GC) [allocation failure].
[29766:0x36c5120]    60471 ms: Scavenge 395.4 (434.3) -> 390.3 (444.3) MB, 20.2 / 0.0 ms (+ 19.2 ms in 96 steps since last GC) [allocation failure].
[29766:0x36c5120]    61781 ms: Scavenge 406.2 (446.3) -> 393.0 (447.3) MB, 3.7 / 0.0 ms (+ 107.6 ms in 236 steps since last GC) [allocation failure].
[29766:0x36c5120]    62107 ms: Scavenge 409.0 (449.3) -> 404.1 (450.3) MB, 16.0 / 0.0 ms (+ 41.0 ms in 227 steps since last GC) [allocation failure].
[29766:0x36c5120]    62446 ms: Scavenge 411.3 (451.3) -> 407.7 (460.3) MB, 22.6 / 0.0 ms (+ 20.2 ms in 103 steps since last GC) [allocation failure].

Questions :

  1. Is this kind of memory consumption expected for so many queries or is this indicative of a leak?
  2. Is my code leaking the memory? Anything obvious that I have missed? Am I using the package in a wrong way?
  3. if this is indeed a leak in the package, are there any immediate workarounds until the leak is fixed?

I am more than happy to provide any other information that is needed to get to the root of this. Please let me know.


Solution

  • Putting an answer here just for the benefit of anyone facing a similar issue.

    In my case the problem was not a memory leak but throughput. The Mysql server(s) that I was running was just not capable of handling so many queries in such a short time. With such a frequency I was sort of just choking the Mysql server.

    Nodejs would just keep creating a new connection and/or a query object for each new query. This object would get released from the memory once the query completed. But the client was sending so many queries at such a high frequency that the Mysql server took a lot of time to complete each query.

    Simply put, the rate at which queries were being made was much higher than the rate at which those queries were being completed. As such a the query / connection objects just started piling up on the client side resulting in an ever increasing memory usage.

    This looked like a leak. But it wasn't.

    One technique that I learnt for distinguishing between a leak and throughput issue is to stop the creation of work (in this case stopping new queries) and checking if the memory usages comes down. If it does then it is a throughput issue, if not it could be a memory leak.

    In my case, up to about 8,000 queries per second would work fine. Around 8.5k to 9k would result in this throughput issue eventually resulting in a crash.