Search code examples
ruby-on-railsjsonpostgresqlsearchhstore

How can I restructure this large PostgreSQL 9.3 table to be indexed and searched more efficiently from a Rails application?


I introduced a design flaw into this application I'm working on. The problem domain involves matching a Product against what is essentially a sparse matrix of 50 possible attributes per Customer. There are millions and millions of rows. So to restate:

Product has_many ProductAttributes
Customer has_many ProductAttributes

A Product will have a different set of ProductAttributes for each Customer.

So I have this table of 50 million rows, and the vast, vast majority of the fields are empty. It is painful to look at. Here are some of my problems.

  • Each product averages only about 4 attributes.
  • In order to make the table case-insensitive searchable, I used citext in all of the attribute columns.
  • The first 3 attributes are indexed, and have wildcard searches performed on them.

I would much rather redo this whole situation by using either HStore or JSON fields. But I am concerned about my ability to perform searches against either. Can fulltext searches be done against either one of these? Would HStore key-value pairs limit me? Currently, each ProductAttribute can have only one value, but I can envision a scenario in which I could use an Array of values in at least one of them. Not a deal killer though.

If I can search on JSON and/or HStore, which is the best one to use considering I am using ActiveRecord from a Rails 3.2 application?

Another alternative would be to break out ProductAttributes into a detail table:

Product has_many CustomerProducts
Customer has_many CustomerProducts
CustomerProducts has_many ProductAttributes

So if a Product had only 3 or 4 attributes for a specific Customer, there would be 3 or 4 records in ProductAttributes. And I could just search the value column for the ProductAttribute, returning the CustomerProdct parent record.

So three possible approaches:

HStore
JSON
Detail table

Thank you for any insight you can offer.


Solution

  • You can absolutely query Hstore, and I think it might be a good solution short of going for a higher normal form in your Schema design, which can be hard if you aren't a Database person.

    You can do a query like:

    Product.where("attributes -> 'Color' = 'Blue'")
    Product.where("attributes -> 'Size' LIKE '%L%'") #finds 'Large' and 'Long' for ex.
    

    Check out this rails cast, full disclosure, its a pro cast (costs $)

    However, do consider trying to get you Schema into maybe 3NF to solve it in a more databasey way.