I have fields name
and author
which are in Arabic and I want to search on them with a keyword
keyword = request.POST.get('search', '')
books = Book.objects.filter(Q(name__icontains=keyword)).order_by('category', 'code')
The problem is that django treats the two letters 'أ' and 'ا' as two different letters while users would expect that they are interchangeable in search and both should give the same output. Is there a way I can treat them as one letter without replacing them in variations of the code? like this
keyword_var1 = keyword.replace('أ', 'ا')
Also the same problem with the letters 'ى' and 'ي'
This problem is known, but is not specific to Django. Django does not define how custom insensitive search works, that is the work of the database. The set of rules on how to treat characters when ordering or checking equivalence is collation.
This Medium article by Ahmed Essam explains problems with simple utf8_unicode_ci collation. If I understand the article correctly the Unicode collation has some shortcomings. Depending on the database, you can construct a custom collation, that looks for example like in the article:
<collation name="utf8_arabic_ci" id="1029"> <rules> <reset>\u0627</reset> <!-- Alef 'ا' --> <i>\u0623</i> <!-- Alef With Hamza Above 'أ' --> <i>\u0625</i> <!-- Alef With Hamza Below 'إ' --> <i>\u0622</i> <!-- Alef With Madda Above 'آ' --> </rules> <rules> <reset>\u0629</reset> <!-- Teh Marbuta 'ة' --> <i>\u0647</i> <!-- Heh 'ه' --> </rules> <rules> <reset>\u0000</reset> <!-- Unicode value of NULL --> <i>\u064E</i> <!-- Fatha 'َ' --> <i>\u064F</i> <!-- Damma 'ُ' --> <i>\u0650</i> <!-- Kasra 'ِ' --> <i>\u0651</i> <!-- Shadda 'ّ' --> <i>\u064F</i> <!-- Sukun 'ْ' --> <i>\u064B</i> <!-- Fathatan 'ً' --> <i>\u064C</i> <!-- Dammatan 'ٌ' --> <i>\u064D</i> <!-- Kasratan 'ٍ' --> </rules> </collation>
It shows that it will rewrite \u0623
so an Alef with Hamza above as a simple Alef (u0627`), and then try to match it.
Then you can set this collation algorithm as the one for these specific column(s), and likely other columns where you use Arabic.