Search code examples
javastringlocale

I want to convert any Language String to English in Java


I am reading feed from a hindi site and want to convert it to english.

public class ReadTest {

    public static void main(String [] args) throws UnsupportedEncodingException {
        RSSFeedParser parser = new RSSFeedParser("http://aajtak.intoday.in.feedsportal.com/c/34152/f/618432/index.rss?option=com_rss&feed=RSS1.0&no_html=1&rsspage=home");
        Feed feed = parser.readFeed();

        System.out.println(feed);
        for (FeedMessage message : feed.getMessages()) {
            System.out.println(message.getTitle());
            System.out.println(message.getDescription());
            System.out.println("Date : " + message.getPublishDate());
            System.out.println("-------------------------");
        }
    }
}

Above is the code which i am using but it will print something like below.

Feed [copyright=, description=?? ??, language=en, link=http://aajtak.intoday.in, pubDate=Sun, 14 Sep 2014 06:10:50 GMT, title=?? ??]
?? ??
??? ??????? ???? ?? ?? ???? ??????. ??????? ?????? ?????? ?? ?????? ????? ????? ?? ???? ?????? ?????????? ??? ????...
Date : Sun, 14 Sep 2014 05:42:56 GMT
-------------------------
?????? ?? ???? ? ???? ?? ???? ???? ????, ?????-???? ???
????? ???????? ?????? ?? ?????? ??????? ????? ???? ???? ?? ?????? ????? ?????? ?? ?? ????? ?? ???? ?????? ??????...
Date : Sun, 14 Sep 2014 04:56:24 GMT

where "?" is getting printed where a hindi character is there.


Solution

  • Your current problem has nothing to do with translation (still not ...) but only with character sets. If the original feed correctly declares its own charset, Java internally uses unicode characters and can read it.

    But I suppose your system uses a character set other than Utf-8 (Latin-1, Win1252, CP-850 or CP437), and cannot display hindi characters. If you use Linux or another Unix-like, ensure you are using UTF-8 ; if you use Windows try to use a graphic windows (Swing) for the output, because I do not know how a command line window (CMD.exe) can handle unicode.

    But if you pass that step, translation is another far more complex problem ...