I introduced a design flaw into this application I'm working on. The problem domain involves matching a Product
against what is essentially a sparse matrix of 50 possible attributes per Customer
. There are millions and millions of rows. So to restate:
Product has_many ProductAttributes
Customer has_many ProductAttributes
A Product
will have a different set of ProductAttributes
for each Customer
.
So I have this table of 50 million rows, and the vast, vast majority of the fields are empty. It is painful to look at. Here are some of my problems.
citext
in all of the attribute columns.I would much rather redo this whole situation by using either HStore or JSON fields. But I am concerned about my ability to perform searches against either. Can fulltext searches be done against either one of these? Would HStore key-value pairs limit me? Currently, each ProductAttribute
can have only one value, but I can envision a scenario in which I could use an Array of values in at least one of them. Not a deal killer though.
If I can search on JSON and/or HStore, which is the best one to use considering I am using ActiveRecord from a Rails 3.2 application?
Another alternative would be to break out ProductAttributes
into a detail
table:
Product has_many CustomerProducts
Customer has_many CustomerProducts
CustomerProducts has_many ProductAttributes
So if a Product had only 3 or 4 attributes for a specific Customer
, there would be 3 or 4 records in ProductAttributes
. And I could just search the value column for the ProductAttribute
, returning the CustomerProdct
parent record.
So three possible approaches:
HStore
JSON
Detail table
Thank you for any insight you can offer.
You can absolutely query Hstore, and I think it might be a good solution short of going for a higher normal form in your Schema design, which can be hard if you aren't a Database person.
You can do a query like:
Product.where("attributes -> 'Color' = 'Blue'")
Product.where("attributes -> 'Size' LIKE '%L%'") #finds 'Large' and 'Long' for ex.
Check out this rails cast, full disclosure, its a pro cast (costs $)
However, do consider trying to get you Schema into maybe 3NF to solve it in a more databasey way.