I'm using FTS4 in my android application to implement full-text search. The data in the app, coming from an API, has diacritics and accents. I've created 2 columns in the database, one which stores the original data and the other column stores data without diacritics or accents (stripped using Normalizer). The search gets executed successfully when I search for words without diacritics or accents. The problem arises when I want to highlight the searched query found in the text.
So for eg. this sentence which I got from SO:
James asked, “’Tis Renée’s and Noël’s great‐grandparents’ 1970's-ish summer‐house, t'isn’t it?” Receiving no answer, he shook his head--and walked away.
If I run a search for Renee, it will highlight Renée but when I execute a search for Renees, it successfully finds text which contain the word Renée’s but because of the apostrophe it will not highlight it.
Search Term: Renee
Highlighted Output: Renée
Search Term: Renees
Highlighted Output: <whitespace>Renée’ <-- doesn't show the expected output
Expected Output: Renée’s
If I use replaceAll
to remove all the apostrophes to highlight the searched query, it will show the highlighted word Renée’s but only till the apostrophe like so -> Renée’ highlighting even the whitespace before the word. But it pushes highlighted word back even more if there are more apostrophes in the paragraph which have been stripped.
Basically I want to show Renée’s in the paragraph displayed to the user and highlight the whole word even if the user searches for Renees.
Here's my code to highlight searched text:
if (searchQuery != null){
String paragraph = data.getParagraph();
SpannableStringBuilder sb = new SpannableStringBuilder(paragraph);
String normalizedText = Normalizer.normalize(paragraph, Normalizer.Form.NFD).replaceAll("\\p{InCombiningDiacriticalMarks}+", "").toLowerCase();
//String normalizedText = Normalizer.normalize(paragraph, Normalizer.Form.NFD).replaceAll("\\p{InCombiningDiacriticalMarks}+", "").replaceAll("'", "").toLowerCase(); //remove all apostrophes -- this works but pushes back the highlighted text color because it doesn't count all stripped apostrophes in the original paragraph.
Pattern word = Pattern.compile(searchQuery, Pattern.CASE_INSENSITIVE);
Matcher match = word.matcher(normalizedText);
while (match.find()) {
BackgroundColorSpan fcs = new BackgroundColorSpan(Color.YELLOW);
sb.setSpan(fcs, match.start(), match.end(), Spannable.SPAN_EXCLUSIVE_EXCLUSIVE);
}
text.setText(sb);
}
How do I highlight the searched word even with apostrophe?
You can add ['’]?
pattern (that matches an optional '
or ’
char) between each char in the searchQuery
:
Pattern word = Pattern.compile(TextUtils.join("['’]?", searchQuery.split("")), Pattern.CASE_INSENSITIVE);
This way, you will make sure the search phrase will match even if there is a single apostrophe anywhere inside it.
See a regex demo.