Search code examples
weaviate

Weaviate constantly running out of memory


I am trying to run an instance of weaviate, but am running across an issue with memory consumption. I have weaviate running in a docker container with 16GB of memory, which looking in the documentation seems like it would be enough for over 1M records (I am using 384 dim vectors just like in the example).

The application connecting to weaviate is constantly inserting and querying for data. The memory usage continues to go up until eventually running out of memory and the docker container dies. This is only around 20k records.

Is this a problem with garbage collection never happening?

UPDATE:

The version of weaviate in question is 1.10.1 and not currently using any modules. Incoming records already have vectors so no vectorizer is being used. The application searches for records similar to the incoming record based on some metadata and nearVector filters then inserts the incoming record. I will be upgrading to 1.12.1 to see if this helps at all, but in the meantime here are some of the suggested memory measurements.

7k records:
docker stats memory usage: 2.56GB / 16GB

gc 1859 @750.550s 0%: 0.33+33+0.058 ms clock, 26+1.2/599/1458+4.6 ms cpu, 2105->2107->1102 MB, 2159 MB goal, 80P
gc 1860 @754.322s 0%: 0.17+34+0.094 ms clock, 13+1.0/644/1460+7.5 ms cpu, 2150->2152->1126 MB, 2205 MB goal, 80P
gc 1861 @758.598s 0%: 0.39+35+0.085 ms clock, 31+1.4/649/1439+6.8 ms cpu, 2197->2199->1151 MB, 2253 MB goal, 80P

11k records:
docker stats memory usage: 5.46GB / 16GB

gc 1899 @991.964s 0%: 1.0+65+0.055 ms clock, 87+9.9/1238/3188+4.4 ms cpu, 4936->4939->2589 MB, 5062 MB goal, 80P
gc 1900 @999.496s 0%: 0.17+58+0.067 ms clock, 13+2.8/1117/3063+5.3 ms cpu, 5049->5052->2649 MB, 5178 MB goal, 80P
gc 1901 @1008.717s 0%: 0.38+65+0.072 ms clock, 30+2.7/1242/3360+5.7 ms cpu, 5167->5170->2710 MB, 5299 MB goal, 80P

17k records:
docker stats memory usage: 11.25GB / 16GB

gc 1932 @1392.757s 0%: 0.37+110+0.019 ms clock, 30+4.6/2130/6034+1.5 ms cpu, 10426->10432->5476 MB, 10694 MB goal, 80P
gc 1933 @1409.740s 0%: 0.14+108+0.052 ms clock, 11+0/2075/5666+4.2 ms cpu, 10679->10683->5609 MB, 10952 MB goal, 80P
gc 1934 @1427.611s 0%: 0.31+116+0.10 ms clock, 25+4.6/2249/6427+8.2 ms cpu, 10937->10942->5745 MB, 11218 MB goal, 80P

20k records:
docker stats memory usage: 15.22GB / 16GB

gc 1946 @1658.985s 0%: 0.13+136+0.077 ms clock, 10+1.1/2673/7618+6.1 ms cpu, 14495->14504->7600 MB, 14866 MB goal, 80P
gc 1947 @1681.090s 0%: 0.28+148+0.045 ms clock, 23+0/2866/8142+3.6 ms cpu, 14821->14829->7785 MB, 15201 MB goal, 80P
GC forced
gc 16 @1700.012s 0%: 0.11+2.0+0.055 ms clock, 8.8+0/20/5.3+4.4 ms cpu, 3->3->3 MB, 7MB goal, 80P
gc 1948 @1703.901s 0%: 0.41+147+0.044 ms clock, 33+0/2870/8153+3.5 ms cpu, 15181->15186->7973 MB, 15570 MB goal, 80P
gc 1949 @1728.327s 0%: 0.29+156+0.048 ms clock, 23+18/3028/8519+3.9 ms cpu, 15548->15553->8168 MB, 15946 MB goal, 80P

pprof

     flat  flat%  sum%          cum   cum%
7438.24MB 96.88% 96.88%   7438.74MB 96.88%  github.com/semi-technologies/weaviate/adapters/repos/db/inverted.(*Searcher).docPointersInvertedNoFrequency.func1
 130.83MB  1.70% 98.58%   7594.13MB 98.91%  github.com/semi-technologies/weaviate/adapters/repos/db/inverted.(*Searcher).DocIDs
      1MB 0.013% 98.59%     40.55MB  0.53%  github.com/semi-technologies/weaviate/adapters/repos/vector/hnsw.(*hnsw).Add
        0     0% 98.59%     65.83MB  0.86%  github.com/go-openapi/runtime/middleware.NewOperationExecutor.func1

UPDATE 2:

Problem still exists after upgrading to 1.12.1


Solution

  • Since you mentioned it crashes at around 20k records already, there should not be a reason for running OOM. Also, at 1M records, 16GB of mem should be plenty, so I'm sure there must be another reason that we can spot.

    First we need some information about your setup:

    1. Which version of Weaviate are you running? At the time of writing this answer the lastest version is v1.12.1. Please makes sure to use the latest version to rule out that you are running into any issue that has already been fixed.
    2. How many and which - if any - modules are you running? This is relevant because the models in the modules take up a constant amount of memory. So in an extreme case with 15,5GB of memory usage from models in modules, you'd only have 500MB left for Weaviate. This is unlikely, but something we should still rule out to be sure.
    3. Are there any limits from Docker? For example, Docker For Mac uses global memory limits for the Docker VM. They may be lower than system-wide limits? Are there any limits set?
    4. You mention that you are both inserting and querying. This should not be a problem, but it might be helpful to be a bit more precise about what your usage looks like. Maybe give an example of requests that you run, the ratio of writes to reads, etc. While this is unlikely to be the cause of any issues, any additional information would be helpful in narrowing this down.

    Profiling

    Please update your original post with the profiling results.

    To investigate memory issues we need some profiles. We can take those from the outside (what does the OS see?) and from the inside (what does the Go runtime see). There is typically a difference between those two. This is because memory that has been freed up by the GC may not have been released to the OS yet.

    Preparations for Profiling

    1. On the Weaviate container set an environment variable with the name GODEBUG to the value gctrace=1. This will make Go's Garbage collector verbose and log any GC activity to the console. It will say when it runs, what the heap was like before and after, as well as printing the next goal size.
    2. Expose port 6060 of the Weaviate container. This will allow generating debug reports from Go's profiler from within.

    Profiling cycle

    1. Immediately after startup (when APIs are ready), run docker stats to print the initial usage of memory of the entire docker setup. This will help us know how much memory is used by each container initially. Please add the result to your question.

    2. Start importing.

    3. In regular intervals (e.g. if you anticipate a crash at 20k elements, I would start at 10k elements imported and take a snapshot every 3k elements), save the output of the following commands into separate files

      • docker stats so we see the OS' perspective
      • The last three lines of the gc profile in the Weaviate container logs

      The closer you can get to the moment it crashes with those profiles, the more meaning they will have.

    4. (Optional) If the previous steps confirmed that the heap was indeed used up entirely, e.g. close to 16GB of heap usage, now the interesting question is "What was on the heap for it to run out so early?". This question can be answered by using the go pprof tool and the port 6060 we exposed earlier. For this you need to install a local Go runtime. Alternatively you can run the commands from within a docker container that has a Go runtime if you don't want to install Go on your host machine. In this case make sure the container can access the Weaviate container, e.g. by putting them in the same Docker network. From the go runtime run the following command go tool pprof -top http://localhost:6060/debug/pprof/heap. Similar to step 3, the closer you can run this command to the moment it crashes, the more meaning it will have. (Note my examples assumes you are running this from the host machine and port 6060 exposes the Weaviate container's port 6060. If you are running this from inside a Docker network with another container adjust the hostname accordingly, e.g. http://weaviate:6060/..., etc.)

    Once you have obtained all these profiles and edited your original post with the profiles, I'm happy to edit this post with some notes on how to interpret them.

    In summary, you should be providing the following artifacts:

    1. A docker stats output from after startup before importing
    2. From a moment that is close to the point it crashes we need
      • The output of docker stats
      • The last lines couple of the GC logs from the Weaviate container
    3. If the results from (2) indeed confirm that the whole 16GB were used up by Weaviate, we need to know by what. We can obtain this info from the pprof heap profile.