I have a full-text index with an empty stoplist.
The table contains a row with the value 'A. Beta'. If I search for 'A. Beta' or 'A. beta' (using full-text search) the row shows up correctly. But if I search for 'a. Beta' or 'a. beta' the row does not show up.
What is happening? Is there a good workaround to this? I could remove full stops from the search string before sending it to the database, but is there anything else I should remove?
Example:
CREATE TABLE mytable (
id BIGINT IDENTITY NOT NULL,
name VARCHAR(256) NOT NULL,
CONSTRAINT mytable_pk PRIMARY KEY (id)
);
CREATE FULLTEXT STOPLIST empty_stoplist;
CREATE FULLTEXT CATALOG mytable_catalog;
CREATE FULLTEXT INDEX ON mytable (
name Language 1033
)
KEY INDEX mytable_pk
ON mytable_catalog;
ALTER FULLTEXT INDEX ON mytable SET STOPLIST empty_stoplist;
ALTER FULLTEXT INDEX ON mytable START UPDATE POPULATION;
INSERT INTO mytable (name) VALUES ('A. Beta');
-- Wait until indexing is complete
SELECT * FROM mytable mt WHERE CONTAINS(mt.*, '"A. Beta"');
-- 1 result
SELECT * FROM mytable mt WHERE CONTAINS(mt.*, '"A. beta"');
-- 1 result
SELECT * FROM mytable mt WHERE CONTAINS(mt.*, '"a. Beta"');
-- 0 results
SELECT * FROM mytable mt WHERE CONTAINS(mt.*, '"a. beta"');
-- 0 results
Running the fts parser on 'A. Beta' and 'a. Beta' gives different results. This might be related:
-- Replace 7 with the id of 'empty_stoplist' or use NULL instead (same result).
select * from sys.dm_fts_parser('"A. Beta"', 1033, 7, 0)
-- keyword group_id phrase_id occurrence special_term display_term expansion_type source_term
-- 0x0061 1 0 1 Exact Match a 0 A. Beta
-- 0x0062006500740061 1 0 2 Exact Match beta 0 A. Beta
vs.
-- Replace 7 with the id of 'empty_stoplist' or use NULL instead (same result).
select * from sys.dm_fts_parser('"a. Beta"', 1033, 7, 0)
-- keyword group_id phrase_id occurrence special_term display_term expansion_type source_term
-- 0x0061 1 0 1 Exact Match a 0 a. Beta
-- 0xFF 1 0 9 End Of Sentence END OF FILE 0 a. Beta
-- 0x0062006500740061 1 0 10 Exact Match beta 0 a. Beta
The problem is actually not with case sensitivity but with the period in a. Beta
which the full text parser is interpreting as "end of sentence". In other words, the parser correctly interprets A.
(uppercase) as an abbreviation and ignores the period but it thinks a.
(lowercase) is the end of a sentence instead of an abbreviation since abbreviations are typically capitalized in English. The full text engine uses end-of-sentence markers as well as end-of-paragraph markers when counting word occurrences to avoid false positives when matching quoted phrases. (I don't know much else about how occurrence counting works other than this vague statement from How Search Query Results Are Ranked: In order to avoid false positives in phrase and proximity queries, end-of-sentence and end-of-paragraph introduce larger occurrence gaps.)
The easiest way to work around this problem is to remove periods from the search string. You will find that both of these searches will return the 1 result you are looking for.
SELECT * FROM mytable mt WHERE CONTAINS(mt.*, '"A Beta"');
SELECT * FROM mytable mt WHERE CONTAINS(mt.*, '"a Beta"');