Is it possible to update one Real-time Sphinx index
in parallel
?
To clarify, I have one RT index, named e.g. RT1. To update RT1, I want to have two or even more updaters.
For instance, if I have 100 files in the queue, I want to add 2 files in parallel to the index. Is Sphinx capable of multi-threading or is Sphinx not thread-safe?
The main question is, will Sphinx corrupt itself when multiple files are being added to the same index at the same time? I wasn't able to find the answer in the documentation.
Good to keep in mind, I multithread
my script
which is adding files to the Sphinx RT-index. Therefore, multiple files will be added at the same time (in parallel) to one index.
Version:
Sphinx 2.2.9-id64-release (rel22-r5006)
Config:
index_name
{
type = rt
path = /mnt/data001/index_name
rt_field = FileName
rt_field = FileExtension
rt_field = FileContent
rt_field = FileTags
rt_attr_uint = FileReference
rt_attr_uint = FileSize
rt_attr_uint = LastModified
rt_attr_uint = LastModifiedYear
rt_attr_uint = LastModifiedMonth
rt_attr_uint = LastModifiedDay
rt_attr_string = FileContent
rt_mem_limit = 1024M
charset_table = A..Z, a..z, 0..9, U+E1, U+E9, U+FA
ondisk_attrs = pool
}
searchd
{
listen = 9306:mysql41
log = /var/log/sphinxsearch/searchd.log
read_timeout = 5
max_children = 30
pid_file = /var/run/searchd.pid
max_packet_size = 128M
binlog_path = /mnt/data001
}
Important to note that the string can only consist of A..Z, a..z, 0..9, U+E1, U+E9 and U+FA. (I have verified this)
Test: for the test I used a C++ application on Ubuntu communicating with Sphinx through the MySQL connectors
I have verified this issue and be warned! Updating the index in parallel is not possible! My index corrupted itself partially (both the index and daemon didn't crash). You will not see this issue at a first glance. I have verified this by inserting and checking the inserted value (by directly selecting it after insertion) and the returned value didn't always match the inserted value as outlined below.
As an example for clarification. I inserted test
but I got back t^463t
from the select (directly after the insertion was performed).
For this test I have inserted 1.000.000
documents spread over a two-threaded
application whereof 43.372
documents had this issue outlined above. This of course is dependent of the exact rate of parallel inserted documents, but Sphinx doesn't seem to be thread safe. (assumed is that even more documents will get corrupted when using more threads to insert documents in parallel)
Sometimes I also noticed that words from multiple documents where concatenated (those documents where inserted at the exact same moment).