I have been working to add a functionality to my multilingual website where i have to highlight the matching tag keywords.
This functionality works for English version but doesn't not fire for arabic version.
I have set up sample on JSFiddle
Sample Code
function HighlightKeywords(keywords)
{
var el = $("#article-detail-desc");
var language = "ar-AE";
var pid = 32;
var issueID = 18;
$(keywords).each(function()
{
// var pattern = new RegExp("("+this+")", ["gi"]); //breaks html
var pattern = new RegExp("(\\b"+this+"\\b)(?![^<]*?>)", ["gi"]); //looks for match outside html tags
var rs = "<a class='ad-keyword-selected' href='http://www.alshindagah.com/ar/search.aspx?Language="+language+"&PageId="+pid+"&issue="+issueID+"&search=$1' title='Seach website for: $1'><span style='color:#990044; tex-decoration:none;'>$1</span></a>";
el.html(el.html().replace(pattern, rs));
});
}
HighlightKeywords(["you","الهدف","طهران","سيما","حاليا","Hello","34","english"]);
//Popup Tooltip for article keywords
$(function() {
$("#article-detail-desc").tooltip({
position: {
my: "center bottom-20",
at: "center top",
using: function( position, feedback ) {
$( this ).css( position );
$( "<div>" )
.addClass( "arrow" )
.addClass( feedback.vertical )
.addClass( feedback.horizontal )
.appendTo( this );
}
}
});
});
I store keywords in array & then match them with the text in a particular div.
I am not sure is problem due to Unicode or what. Help in this respect is appreciated.
Why it's not working
An example of how you could approach it in English (meant to be adapted to Arabic by someone with a clue about Arabic)
A stab at doing the Arabic version by someone (me) who hasn't a clue about Arabic :-)
At least part of the problem is that you're relying on the \b
assertion, which (like its counterparts \B
, \w
, and \W
) is English-centric. You can't rely on it in other languages (or even, really, in English — see below).
Here's the definition of \b
in the spec:
The production Assertion
:: \ b
evaluates by returning an internalAssertionTester
closure that takes aState
argumentx
and performs the following:
- Let
e
bex
'sendIndex
.- Call
IsWordChar(e–1)
and leta
be theBoolean
result.- Call
IsWordChar(e)
and letb
be theBoolean
result.- If
a
istrue
andb
isfalse
, returntrue
.- If
a
isfalse
andb
istrue
, returntrue
.- Return
false
.
...where IsWordChar
is defined further down as basically meaning one of these 63 characters:
a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 _
E.g., the 26 English letters a
to z
in upper or lower case, the digits 0
to 9
, and _
. (This means you can't even rely on \b
, \B
, \w
, or \W
in English, because English
has loan words like "Voilà", but that's another story.)
You'll have to use a different mechanism for detecting word boundaries in Arabic. If you can come up with a character class that includes all of the Arabic "code points" (as Unicode puts it) that make up words, you could use code a bit like this:
var keywords = {
"laboris": true,
"laborum": true,
"pariatur": true
// ...and so on...
};
var text = /*... get the text to work on... */;
text = text.replace(
/([abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_]+)([^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_]+)?/g,
replacer);
function replacer(m, c0, c1) {
if (keywords[c0]) {
c0 = '<a href="#">' + c0 + '</a>';
}
return c0 + c1;
}
Notes on that:
[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ]
to mean "a word character". Obviously you'd have to change this (markedly) for Arabic.[^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ]
to mean "not a word character". This is just the same as the previous class with the negation (^
) at the outset.(...)
) for both.String#replace
calls the replacer
function with the full text that matched followed by each capture group as arguments.replacer
function looks up the first capture group (the word) in the keywords
map to see if it's a keyword. If so, it wraps it in an anchor.replacer
function returns that possibly-wrapped word plus the non-word text that followed it.String#replace
uses the return value from replacer
to replace the matched text.Here's a full example of doing that: Live Copy | Live Source
<!DOCTYPE html>
<html>
<head>
<meta charset=utf-8 />
<title>Replacing Keywords</title>
</head>
<body>
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
<script src="http://code.jquery.com/jquery-1.9.1.min.js"></script>
<script>
(function() {
// Our keywords. There are lots of ways you can produce
// this map, here I've just done it literally
var keywords = {
"laboris": true,
"laborum": true,
"pariatur": true
};
// Loop through all our paragraphs (okay, so we only have one)
$("p").each(function() {
var $this, text;
// We'll use jQuery on `this` more than once,
// so grab the wrapper
$this = $(this);
// Get the text of the paragraph
// Note that this strips off HTML tags, a
// real-world solution might need to loop
// through the text nodes rather than act
// on the full text all at once
text = $this.text();
// Do the replacements
// These character classes match JavaScript's
// definition of a "word" character and so are
// English-centric, obviously you'd change that
text = text.replace(
/([abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_]+)([^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_]+)?/g,
replacer);
// Update the paragraph
$this.html(text);
});
// Our replacer. We define it separately rather than
// inline because we use it more than once
function replacer(m, c0, c1) {
// Is the word in our keywords map?
if (keywords[c0]) {
// Yes, wrap it
c0 = '<a href="#">' + c0 + '</a>';
}
return c0 + c1;
}
})();
</script>
</body>
</html>
I took at stab at the Arabic version. According to the Arabic script in Unicode page on Wikipedia, there are several code ranges used, but all of the text in your example fell into the primary range of U+0600 to U+06FF.
Here's what I came up with: Fiddle (I prefer JSBin, what I used above, but I couldn't get the text to come out the right way around.)
(function() {
// Our keywords. There are lots of ways you can produce
// this map, here I've just done it literally
var keywords = {
"الهدف": true,
"طهران": true,
"سيما": true,
"حاليا": true
};
// Loop through all our paragraphs (okay, so we only have two)
$("p").each(function() {
var $this, text;
// We'll use jQuery on `this` more than once,
// so grab the wrapper
$this = $(this);
// Get the text of the paragraph
// Note that this strips off HTML tags, a
// real-world solution might need to loop
// through the text nodes rather than act
// on the full text all at once
text = $this.text();
// Do the replacements
// These character classes just use the primary
// Arabic range of U+0600 to U+06FF, you may
// need to add others.
text = text.replace(
/([\u0600-\u06ff]+)([^\u0600-\u06ff]+)?/g,
replacer);
// Update the paragraph
$this.html(text);
});
// Our replacer. We define it separately rather than
// inline because we use it more than once
function replacer(m, c0, c1) {
// Is the word in our keywords map?
if (keywords[c0]) {
// Yes, wrap it
c0 = '<a href="#">' + c0 + '</a>';
}
return c0 + c1;
}
})();
All I did to my English function above was:
[\u0600-\u06ff]
to be "a word character" and [^\u0600-\u06ff]
to be "not a word character". You may need to add some of the other ranges listed here (such as the appropriate style of numerals), but again, all of the text in your example fell into those ranges.To my very non-Arabic-reading eyes, it seems to work.