We are using javamail to fetch emails from email accounts & lately we have had emails sent with Chinese, Japanese characters.
For example, here's some japanese content:
限定クリエイティブツールのコレクションを含む高速写真編集ソフトウェア。
And it would probably get outputted like this:
<div>=1B$B$"=1B(B =1B$B$$=1B(B =1B$B$&=1B(B =1B$B$(=1B(B =1B$B$*=1B(B =1B$B=
$+=1B(B =1B$B$-=1B(B =1B$B$/=1B(B =1B$B$1=1B(B =1B$B$3=1B(B =1B$B$5=1B(B =
=1B$B$7=1B(B =1B$B$9=1B(B =1B$B$;=1B(B =1B$B$=3D=1B(B =1B$B$,=1B(B =1B$B$.=
=1B(B =1B$B$0=1B(B =1B$B$2=1B(B =1B$B$4=1B(B =1B$B$Q=1B(B =1B$B$T=1B(B =1B$=
B$W=1B(B =1B$B$Z=1B(B =1B$B$]=1B(B</div>
And content-type
is usually text/html; charset=UTF-8
.
We are using writeTo
method to get all the headers and content.
I tried doing the following but it didn't work:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
m.writeTo(baos);
pm.setUnProcessedMessage(baos.toString("UTF-8")); //Here I am explicitly stating the encoding
Also, I believe the issue might be because we are using an old version of JavaMail (1.5.0).
What can we do here to handle foreign characters?
Using the writeTo method gives you the MIME encoded content of the message. It sounds like you want the decoded content, for which you should use the getContent or getInputStream method. The getContent method will return a String of Unicode characters, which you can use directly. The getInputStream method will return a byte string with the character encoding specified by the charset parameter; you'll need to wrap it with a Reader to get the Unicode characters.
If you also want the headers, e.g., to display them along with the message content, you should use the getSubject, getRecipients, etc. methods, which again will return you decoded content. You can use the getHeader method to get other headers, but you'll need to decode the content yourself using the MimeUtility methods.