Search code examples
pythonwhoosh

Does whoosh require all strings to be unicode?


I am redoing my search app in Whoosh from Solr. I am now learning from the quick start. But I kept running into problems each time I had to deal with strings

>>>writer.add_document(iden=fil, content=F2T.file_to_text(fil_path)) ValueError: 'File Name.doc' is not unicode or sequence

and then:

>>>query = QueryParser("content", ix.schema).parse("first")
AssertionError: 'first' is not unicode

And THAT line comes straight from the quick-start turorial! Does Whoosh require all fields to be in unicode? It will be real hard work to make my app unicode-aware (and its not even worth it). As for "not unicode or sequence", I understand that string is also a sequence data type.


Solution

  • Yes, it requires strings are in Unicode.

     query = QueryParser("content", ix.schema).parse("first")
    

    Change that to:

    query = QueryParser("content", ix.schema).parse(u"first")