Search code examples
sqlpostgresqlsql-like

optimizing PGSQL SQL search queries on big texts ('like', full text search, ... )


We have a software solution which is used by +200 customers. We recently switched to pgsql, because our former database was too slow handeling the search queries our customers use.

Our dabatabase looks like this:

TABLE A

 1. ID
(+ some other fields which aren't important here)

TABLE B

This table is used to store 'data' on the items in table A. This is different for every customer. For example 'Type' can be 'CLIENTNAME' and value 'AZERTY'. One record on TABLE A can have infinite records in TABLE B. Mostly 1 record in Table A has between 5 - 10 records on Table B.

1. ID TABLE A
 2. TYPE
 3. VALUE

TABLE C

 1. TABLE A ID
 2. VERSIONNR
 3. DESCRIPTION

This file has the different verions of the records in TABLE A. Each of these versions has an extended description. This can range from 0 characters to infinite.

Our problem: our customers are used on 'google-like' searching. For example: they type 'AZERTY' and we show all the records from TABLE A where the ID of TABLE A:

  • 'AZERTY' is in the description of the most recent version of TABLE C
  • 'AZERTY' is in one of the values of TABLE B

Additional problem: this search is a 'contains'. If they search 'ZER', they should also find the records with 'AZERTY' in it. Multiple arguments are an 'AND', if they search for 'ZER 123', we need to show all records where the description matches 'ZER' and '123' or the values match 'ZER' and '123'.

What we have done so far:

  • There is an option a user can check in/out whether they want to search the description or not. We mosty advice them to only search for the values and only use the description in case of need.
  • We make several search threads to the database for one search query, because searching all documents at once would take too much time.
  • Some time ago, on our former slow database engine, a collegue of mine made 'search tables', basically this is a table which contains all values on a TABLE A ID so there isn't need for any join in the SQL query when searching. It looks like this:

TABLE D

  • TABLE A ID
  • VALUES (all values from TABLE B for this TABLE A ID, seperated by a ' ')
  • DESCRIPTION (the description of the most recent version for this TABLE A ID)

Example record:

- 1
- ZER 123 CLIENT NAME NUMBER 7856 jsdfjklf 4556423
- DESCRIPTION CAN BE VERY LONG.

If a customer searches for 'ZER 123' this becomes: "select TABLE_A_ID from TABLE_D where values like '%ZER%' and values like '%123%'"

Important: Some of our customers have alot of records in TABLE A. +5.000.000, which means there are alot of records in TABLE B (+/- 50.000.000). Most of our customers have between 300.000 and 500.000 records in TABLE A.

My questions:

  • Is there a better / faster way to search through all the values then that search table? Without the search table i would have to make a join for every ' ' in the search argument of the customer, which will work too slow (i think?) if they have alot of records in TABLE A. For example:

    select ID from TABLE_A INNER JOIN TABLE_B Sub1 ON TABLE_A.ID = Sub1.TABLE_A_ID and Sub1.VALUE like '%ZER%' INNER JOIN TABLE_B Sub2 on FILE_A.ID = Sub2.TABLE_A_ID and Sub2.VALUE like '%123%'

  • I have taken a look at the full text search in PGSQL. I don't think i can use it since you can't use it as like (= 'contains') ?

  • Is there any index I can use on the values (FILE B or search file) and description (FILE C or search file) to make the searches faster? I've read on it and i don't think there is any, because indexes aren't used when searching with "like '%ZER%'" ?

I hope i've explained this cleary. Thanks in advance!


Solution

  • Your terminology is confusing, but I assume you mean "tables" when you write "files".

    You cannot reasonably search in several tables with a single query, but you can search in several columns of a single table at the same time.

    Based on your description, I would say that you need a trigram index on the concatenation of the relevant string columns in the table.