Search code examples
sqlsql-serversql-server-2008sql-server-2012freetext

FREETEXT not working with thesaurus file SQL Server 2012


As per MSDN I did this after I edited the thesaurus file, which loads the English file (tseng.xml) into the thesaurus:

EXEC sys.sp_fulltext_load_thesaurus_file 1033;
GO

Mini sample part of thesaurus File

<XML ID="Microsoft Search Thesaurus">
    <thesaurus xmlns="x-schema:tsSchema.xml">
    <diacritics_sensitive>0</diacritics_sensitive>
        <expansion>
            <sub>car</sub>
            <sub>coche</sub>
            <sub>automobile</sub>
            <sub>ride</sub>
        </expansion>
    </thesaurus>
</XML>

The indexed column c.keywords field consists of 'bike,car' A quick check with the query:

DECLARE @strsearch varchar(200)
SET @strsearch = 'automobile'
SELECT adid from ads INNER JOIN campaign c ON ads.campid=c.campaignid 
WHERE 
FREETEXT(c.keywords, @strsearch)

RESULTS
========
ad NULL

Freetext looking for 'car' does work

DECLARE @strsearch varchar(200)
SET @strsearch = 'car'
SELECT adid from ads INNER JOIN campaign c ON ads.campid=c.campaignid 
WHERE 
FREETEXT(c.keywords, @strsearch)

RESULTS
========
ad 41

So FREETEXT using the thesaurus doesn't return any results. I thought by default FREETEXT used the thesaurus file. What am I doing wrong?


Solution

  • Directly from MSDN (http://msdn.microsoft.com/en-us/library/cc280598.aspx):

    A. Load a thesaurus file even if it is already loaded

    The following example parses and loads the English thesaurus file.

    EXEC sys.sp_fulltext_load_thesaurus_file 1033;
    GO
    

    B. Load a thesaurus file only if it is not yet loaded

    The following example parses and loads the Arabic thesaurus file, unless it is already loaded.

    EXEC sys.sp_fulltext_load_thesaurus_file 1025, @loadOnlyIfNotLoaded = 1;
    GO
    

    I was using the tseng.xml file. Well, there's two English thesaurus files: tseng.xml and tsenu.xml

    tseng.xml is for English (United Kingdom) while tsenu.xml is for English (United States). The LCID (1033) corresponds to the tsenu.xml file

    The XML file when saved must be in Unicode format and Byte Order Marks must be specified, not ansi or UTF-8. I used Microsoft notepad and saved the tsenu.xml file as Unicode.