Search code examples
androidsqlitesyntaxfts4

Sqlite fts4 search html encoded character


I am coding a dictionary project. Its can translate english to arabic or arabic to english. Words are stored in sqlite fts4 database. Arabic letter in database stored html encoded like

غير

When i use fts4 query syntax in english to arabic for example => stor

SELECT * FROM fts_dic WHERE english MATCH '"^stor*"';

Returned results are good for me like

store
stored
storage

But when i search arabic to english for => غير

SELECT * FROM fts_dic WHERE english MATCH '"^غير*"';

Returned results

ظغير׾
֎׾غيرظ

But i want to see result only start with my searched html encoded text like

غيرخ
غيرٗ

I use "^" at the begining of the word to get this result as you see. In english to arabic works fine but arabic to english not working properly.


Solution

  • The FTS documentation says:

    A term is a contiguous sequence of eligible characters, where eligible characters are all alphanumeric characters and all characters with Unicode codepoint values greater than or equal to 128. All other characters are discarded when splitting a document into terms. Their only contribution is to separate adjacent terms.

    In other words, punctuation characters like &#; are completely ignored; what FTS sees are the three words 1594, 1610, and 1585.

    In the FTS table, you should not HTML-encode anything; just use the plain Unicode characters.

    Furthermore, ^ works only in FTS4 tables (which may not be available in all Android versions).