How to Remove the Emoticons from the String My simple code is..
public static void main(String[] args) throws SQLException {
String str="My nam is ur -D ";
getRefineCode(str);
}
private static void getRefineCode(String str) throws {
List smstypeWord=getshortWord();
for(int i=0;i<smstypeWord.size();i++) {
String string=smstypeWord.get(i).toString();
String stringcon[]=string.split("_");
String emessage=stringcon[0];
String emoticon=stringcon[1].trim();
if(str.contains(emoticon)) {
str=str.replace(emoticon, emessage);
System.out.println("=================>"+str);
}
}
System.out.println("=======++==========>"+str);
}
private static List getshortWord() throws SQLException {
String query1 = "SELECT * FROM englishSmsText";
PreparedStatement ps = conn.prepareStatement(query1);
ResultSet rs = ps.executeQuery();
String f_message="";
String s_message="";
while(rs.next()) {
s_message=rs.getString("message");
f_message=rs.getString("short_text");
shortMessage.add(s_message+"_"+f_message);
//fullMessage.add(f_message);
}
return shortMessage;
}
My database is based on http://smsdictionary.co.uk/abbreviations site
I able to understand how to remove the multiple abb. or short message
output is like My nam is You are SquintLaughtGrinisappGaspoooh!!shockedintedr, Big SmilGrinisappGaspoooh!!shockedinted, Grin
First of all, replace
should be replaceAll
, otherwise you will only catch the first occurrence of an emoticon or abbreviation.
Second, you can reduce the number of false positives by matching only whole words. replaceAll
accepts regular expressions, so you can use replaceAll("\\b" + emoticon + "\\b", emessage)
to only replace abbreviations which are surrounded by word boundaries (whitespace, punctuation etc.).
However, with the dictionary you are using you will still replace KISS
with Keep It Simple, Stupid
. You will replace 86
with "out Of" Or "over" Or "to Get Rid Of"
... Maybe you should be looking for a different approach.
Edit: I forgot you were looking for special characters. You should try something like this regex, which will suppress special characters in the search string (and will be more generous than the previously too-strict \b
pattern):
replaceAll("((?<=\\W)|^)\\Q" + emoticon + "\\E((?=\\W)|$)", emessage);
It should cover most cases, I doubt there is any way to perfectly identify what is intended as an acronym and what is not.