Search code examples
csvfull-text-searchmanticore-search

Manticore search fails indexing empty set on csvpipe/tsvpipe


I am using Manticore search engine(forked from Sphinx). I am setting up a pair of indexes implementing main+delta approach. Delta index is updated using tsvpipe.

source postings_source_delta
{
  type = tsvpipe
  tsvpipe_command = bash /opt/get-delta.sh 2>/var/log/manticore/delta_index_error.log
  tsvpipe_field = content
  tsvpipe_attr_string = mongoId
}

get-delta.sh script yields tsv with latest items recently added to database. The problem is that if there are no items then tsv becomes empty and in this case indexer is failing with error.

ERROR: index 'postings_index_delta': source 'postings_source_delta': read error 'Inappropriate ioctl for device'.

This makes indexing with tsv/csv unreliable. Is there a way to solve this problem?


Solution

  • In general (for all sources) Manticore doesn't enable creation of empty plain indexes, but there's a trick - you can do it using a mysql source:

    source min {
        type = mysql
        sql_host = localhost
        sql_user = test
        sql_pass =
        sql_db = test
        sql_query = select 1, 'dog' Doc, 1 group_id, 'red' color, 3.5 size from t where 1=0
        sql_field_string = doc
        sql_attr_uint = group_id
        sql_attr_string = color
        sql_attr_float = size
    }
    

    will give you:

    Manticore 5.0.2 348514c@220530 dev (columnar 1.15.4 2fef34e@220522)
    Copyright (c) 2001-2016, Andrew Aksyonoff
    Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
    Copyright (c) 2017-2022, Manticore Software LTD (https://manticoresearch.com)
    
    using config file 'min_sql_empty.conf'...
    indexing index 'idx'...
    collected 0 docs, 0.0 MB
    total 0 docs, 0 bytes
    total 0.137 sec, 0 bytes/sec, 0.00 docs/sec
    total 0 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
    total 10 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
    rotating indices: successfully sent SIGHUP to searchd (pid=1742606).
    

    So what you could do is check if your TSV command returns anything and if it doesn't - use this trick.

    It's recommended to use an RT index instead.

    UPDATE

    xmlpipe2 can also build an empty plain index, e.g.

    snikolaev@dev:~$ cat xml_empty.conf
    source min {
      type = xmlpipe2
      xmlpipe_command = cat xml_empty
    }
    
    index idx {
      path = idx/xml_empty
      source = min
    }
    
    searchd {
        listen = 9315:mysql41
        log = manticore.log
        pid_file = 9315.pid
        binlog_path =
    }
    
    snikolaev@dev:~$ cat xml_empty
    <?xml version="1.0" encoding="utf-8"?>
    <sphinx:docset xmlns:sphinx="http://sphinxsearch.com/">
    <sphinx:schema>
        <sphinx:attr name="a" type="int" />
        <sphinx:field name="f" />
    </sphinx:schema>
    </sphinx:docset>
    

    will give:

    snikolaev@dev:~$ indexer -c xml_empty.conf --all
    Manticore 5.0.2 348514c@220530 dev (columnar 1.15.4 2fef34e@220522)
    Copyright (c) 2001-2016, Andrew Aksyonoff
    Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
    Copyright (c) 2017-2022, Manticore Software LTD (https://manticoresearch.com)
    
    using config file 'xml_empty.conf'...
    indexing index 'idx'...
    collected 0 docs, 0.0 MB
    total 0 docs, 0 bytes
    total 0.112 sec, 0 bytes/sec, 0.00 docs/sec
    total 0 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
    total 8 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg