Search code examples
databaseoracle-databasedatabase-designrelational-databaseentity-attribute-value

How to structure a table for fast search on a large number of columns


I have a table with a large number of columns (~60), which will eventually have a large number of rows (~10 000), and I'm going to need to be able to search efficiently on several column values at once. I'm not sure whether the searches will be exact-match (LIKE 'value', and not LIKE '%value%'), although LIKE 'value%' might be an acceptable compromise.

A few solutions have been proposed. I'm not very familiar with database design principles, so it's not obvious to me which is the best:

  1. Index on every column individually. The users will be able to search on any combination of columns, so no more complicated indexes will work. There will be a lot more reads than writes on the database, so the write-speed slowdown shouldn't be a problem.

  2. Make another table just for searching that looks like this:

    obj_id  col_num  col_name  col_value
    -------------------------------------    
    1       1        'name'    'joe'    
    1       2        'job'     'engineer'    
    2       1        'name'    'bill'
    

    etc. I think the col_num and col_name columns are redundant, but presumably one is better than another. I have no idea what this is called, although it sounds like the Entity-Attribute-Value model (see also this question). From what I can tell, the main difference from an EAV model is that this table would not be sparse; all entities will have most or all attributes.

  3. Make another table for an inverted index on the first table. I know how to do this in theory, but it would be a huge amount of work. Also, we'd probably lose information about which column each datum is from, which is not great. Also also, this feels like it would be redundant with solution 1, but I don't actually know how table indexes are created.

Those are the solutions that we have come up with so far. If it's relevant, we're using an Oracle db, which is not really optional, but I have the permissions to refactor the database in any way necessary. So, what is the best solution here? "None of the above" is a totally acceptable answer, of course. None of these tables actually exist yet, so there's nothing to wipe out and remake.

Thanks!


Solution

  • The examples you mention are indeed a better match for full text searching (as Bill Karwin suggests). Without seeing a (draft) table definition, it's hard to see if that's actually the case.

    The good news is that 10K records is a trivial amount for a well-tuned Oracle server. If that's the largest your table will grow, I would avoid any exotic solutions in favour of maintainability.

    EAV basically makes boolean operators into a huge pain in the backside, and makes supporting specific datatypes (text, dates, numbers etc.) into an equally big pain.

    I'd build a sample of your table with your best guess at the indexing scheme, populate it with representative dummy data, and run queries along the lines of the ones you expect to need. Measure performance, and see if you have a problem; optimize your indices and queries, and only go to a refactoring if you really need to.