Search code examples
cachingelixirphoenix-frameworkgen-serverets

Elixir process taking up too much memory


I am reading postcodes from a csv file, taking that data and caching it with ets.

The postcode file is quite large (95MB) as it contains about 1.8 million entries.

I am only caching the postcodes that are needed for look ups at the moment (about 200k) so the amount of data stored in ets should not be an issue. However no matter how small the number of inserts into ets is, the amount of memory taken up by the process is virtually unchanged. Doesn't seem to matter if I insert 1 row or all 1.8 million.

# not logging all functions defs so it is not too long.
# Comment if more into is needed.
defmodule PostcodeCache do
  use GenServer

  def cache_postcodes do
    "path_to_postcode.csv"
    |> File.read!()
    |> function_to_parse()
    |> function_to_filter()
    |> function_to_format()
    |> Enum.map(&(:ets.insert_new(:cache, &1)))
  end
end

I am running this in the terminal with iex -S mix and running the command :observer.start. When I go to the processes tab, my postcodeCache memory is massive (over 600MB)

Even if I filter the file so I only end up storing 1 postcode in :ets it is still over 600MB.


Solution

  • I realised that the error I was making was when I was looking at the memory of the process and assuming that it was to do with the cache.

    Because this is a GenServer it is holding onto all the information from csv file when it is read (File.read!) and also appears to be holding onto all changes made to that file as well.

    How I have solved this is by changing the File.read! to a File.stream!. I then use Enum.each instead of mapping over the returned data.

    In the each I check the postcode is what I want and if it is I then insert it into ets.

    def cache_postcodes do
      "path_to_postcode.csv"
      |> File.stream!()
      |> Enum.each(fn(line) ->
        value_to_store = some_check_on_line(line)
        :ets.insert_new(:cache, &1)
      end)
    end
    

    With this approach my processes memory is now only about 2MB (not 632MB) and my ets memory is about 30MB. That is about what I would expect.