Search code examples
javascriptruby-on-railsrubylarge-data

Ruby on Rails - Storing and accessing large data sets


I am having a hard time managing the storage and access of a large dataset within a Ruby on Rails application. Here is my application in a nutshell: I am performing Dijkstra's algorithm as it pertains to a road network, and then displaying the nodes that it visits using the google maps API. I am using an open dataset of the US road network to construct the graph by iterating over two txt files given in the link, but I am having trouble storing this data in my app.

I am under the impression that a large dataset like this not an ActiveRecord object - I don't need to modify the contents of this data, rather be able to access it and cache it locally in a hash to perform ruby methods on it. I have tried a few things but I am running into trouble.

  1. I figured that it would make most sense to parse the txt files and store the graph in yml format. I would then be able to load the graph into a DB as seed data, and grab the graph using Node.all, or something along those lines. Unfortunately, the yml file becomes too large for rails to handle. Running a Rake causes the system to run at 100% for infinity...

  2. Next I figured, well since I don't need to modify the data, I can just create the graph every time the application loads as start of its "initialization." But I don't exactly know where to put this code, I need to run some methods, or at least a block of data. And then store it in some sort of global/session variable that I can access in all controllers/methods. I don't want to be passing this large dataset around, just have access to it from anywhere.

  3. This is the way I am currently doing it, but it is just not acceptable. I am parsing the text files that creates the graph on a controller action, and hoping that it gets computing before the server times out.

Ideally, I would store the graph in a database that I could grab the entire contents to use locally. Or at least only require the parsing of the data once as the application loads and then I would be able to access it from different page views, etc.. I feel like this would be the most efficient, but I am running into hurdles at the moment.

Any ideas?


Solution

  • You're on the right path. There are a couple of ways to do this. One is, in your model class, outside of any method, set up constants like these examples:

    MY_MAP = Hash[ActiveRecord::Base.connection.select_all('SELECT thingone, thingtwo from table').map{|one| [one['thingone'], one['thingtwo']]}]
    RAW_DATA = `cat the_file`  # However you read and parse your file
    CA = State.find_by_name 'California'
    NY = State.find_by_name 'New York'
    

    These will get executed once in a production app: when the model's class is loaded. Another option: do this initialization in an initializer or other config file. See the config/initializers directory.