Search code examples
sphinx

How to delete row from Index


I know that I can't to delete the row from Index, I can to delete row only from real time index. But I must delete row from Index but I don't now how to do it. So, it's my table and records:

+------+------+--------+
| id   | name | status |
+------+------+--------+
|    1 | aaa  |      1 |
|    2 | bbb  |      1 |
|    3 | ccc  |      1 |
+------+------+--------+

it's my sphinx config:

source mainSourse : mainConfSourse 
{
    sql_query = \
        SELECT id, name, status \
        from test_table

    sql_field_string = name
    sql_attr_uint = status

}
index testIndex
{
    source          = mainSourse
    path            = C:/sphinx/data/test/testIndex
    morphology      = stem_enru

    charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+0435, U+451->U+0435
    min_prefix_len = 3
    index_exact_words = 1
    expand_keywords = 1
}

index testIndexRT
{
    type            = rt
    path            = C:/sphinx/data/test/testIndexRT

    rt_field = name
    rt_attr_string = name
    rt_attr_uint = status

    charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+0435, U+451->U+0435
    min_prefix_len = 3
    index_exact_words = 1
    expand_keywords = 1
}

After sphinx server starting, when if I want update record from testIndex I just write the new record to testIndexRT example:

insert into testIndexRT (id,name,status) values (1,'aaa_updated',1);

then that request select * from testIndex,testIndexRT where status=1; show me:

+------+-------------+--------+
| id   | name        | status |
+------+-------------+--------+
|    1 | aaa_updated |      1 |
|    2 | bbb         |      1 |
|    3 | ccc         |      1 |
+------+-------------+--------+

it's worked, awesome! But the problem begins when I want to delete record from Index. I thought I would make it easy just like with the update but after this code update testIndexRT set status=2 where id=1 I see:

+------+------+--------+
| id   | name | status |
+------+------+--------+
|    1 | aaa  |      1 |
|    2 | bbb  |      1 |
|    3 | ccc  |      1 |
+------+------+--------+

sphinx just showed me records from testIndex although row with id 1 was updated in testIndexRT select * from testIndexRT;:

+------+--------+-------------+
| id   | status | name        |
+------+--------+-------------+
|    1 |      2 | aaa_updated |
+------+--------+-------------+

I realized it method not working:( I can't save all records from DB to testIndexRT, because my realy table is a large and it's size ~60 Gb. Some body tell me please, maybe there are other methods that I don't know?


Solution

  • 60G should not be a problem for an RT index, but if you want to stick with plain indexes you can use main+delta technique to achieve what you want. Here is an interactive course about that - https://play.manticoresearch.com/maindelta/ (it's based on Manticore Search which is a fork of Sphinx, but in Sphinx should be all the same, just killlist_target is named differently in Sphinx 3).

    Here's another example:

    MySQL:

    mysql> desc data;
    +---------+------------+------+-----+-------------------+-----------------------------+
    | Field   | Type       | Null | Key | Default           | Extra                       |
    +---------+------------+------+-----+-------------------+-----------------------------+
    | id      | bigint(20) | NO   | PRI | 0                 |                             |
    | body    | text       | YES  |     | NULL              |                             |
    | updated | timestamp  | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
    +---------+------------+------+-----+-------------------+-----------------------------+
    3 rows in set (0.00 sec)
    
    mysql> desc helper;
    +----------+--------------+------+-----+-------------------+-----------------------------+
    | Field    | Type         | Null | Key | Default           | Extra                       |
    +----------+--------------+------+-----+-------------------+-----------------------------+
    | chunk_id | varchar(255) | NO   | PRI |                   |                             |
    | built    | timestamp    | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
    +----------+--------------+------+-----+-------------------+-----------------------------+
    2 rows in set (0.00 sec)
    

    Config:

    source main
    {
            type = mysql
            sql_host = localhost
            sql_user = root
            sql_pass =
            sql_db = test
            sql_query_pre = replace into helper set chunk_id = '1_tmp', built = now()
            sql_query = select id, body, unix_timestamp(updated) updated from data where updated >= from_unixtime($start) and updated <= from_unixtime($end)
            sql_query_range = select (select unix_timestamp(min(updated)) from data) min, (select unix_timestamp(built) - 1 from helper where chunk_id = '1_tmp') max
            sql_query_post_index = replace into helper set chunk_id = '1', built = (select built from helper t where chunk_id = '1_tmp')
            sql_range_step = 100
            sql_field_string = body
            sql_attr_timestamp = updated
    }
    
    source delta : main
    {
            sql_query_pre =
            sql_query_range = select (select unix_timestamp(built) from helper where chunk_id = '1') min, unix_timestamp() max
            sql_query_killlist = select id from data where updated >= (select built from helper where chunk_id = '1')
            killlist_target = idx_main:kl
    }
    
    index idx
    {
            type = distributed
            local = idx_main
            local = idx_delta
    }