I want to develop pattern searching algorithm for an music system application which searches for given keyword and plays the music whose text file contains the given keyword. Now there are many pattern searching algorithm which can do this efficiently(ex: KMP, hashing(may give error) etc). But my main problem is that the whole database is in language other than english( "Hindi" to be specific). Now the user enters the given keyword in "Hindi" language and I want to search in the database that also contains "Hindi" language. My main concern is that how to efficiently search in this database?
I think that we can't do KMP algorithm for non-english language because ascii charaters that we use only contains english alphabets and other numeric letters but doesn't contains letters of other language. So,please tell me how can I proceed further as I am not able to get solution or tell where I am thinking in wrong way?
KMP algorithm don't base on alphabet, it uses characters from given pattern and text. Moreover in languages like Java, strings use UTF-8 encoding, so u can use any langague you like and algorithm will work properly, in others you need to choose encoding explicitly. Here I give link to example on Ideone of using KMP with non ascii charset. KMP algorithm
/* package whatever; // don't place package name! */
import java.util.*;
import java.lang.*;
import java.io.*;
class Ideone {
int[] f;
public void dfa(String pattern) {
int m = pattern.length();
f = new int[m+1];
f[0] = 0;
f[1] = 0;
for(int i=2; i<=m; i++) {
int j = f[i-1];
for(;;) {
if(pattern.charAt(j) == pattern.charAt(i-1)) {
f[i] = j +1;
break;
}
if(j==0) {
f[i] = 0;
break;
}
j = f[j];
}
}
}
public int match(String text, String pattern) {
dfa(pattern);
int n = text.length();
int m = pattern.length();
int i = 0;
int j = 0;
for(;;) {
if(i == n) break;
if(text.charAt(i) == pattern.charAt(j)) {
j++;
i++;
if(j == m) return i;
}
else if(j > 0) j =f[j];
else i++;
}
return -1;
}
public static void main(String[] args) {
Ideone kmp = new Ideone();
String text = "AĄĘĆABA";
String pattern = "ĄĘĆ";
System.out.println(kmp.match(text, pattern));
}
}