I have RTF
files are encoded in ANSI
while it contains Arabic
phrases. I'm trying to read this file but couldn't read it in the right encoding.
RTF File:
{\rtf1\fbidis\ansi\deff0{\fonttbl{\f0\fnil\fcharset178 MS Sans Serif;}{\f1\fnil\fcharset0 MS Sans Serif;}}
\viewkind4\uc1\pard\ltrpar\lang12289\f0\rtlch\fs16\'ca\'d1\'cc\'e3\'c9: \'d3\'e3\'ed\'d1 \'c7\'e1\'e3\'cc\'d0\'e6\'c8\f1\ltrch\par
}
and my java code is:
RTFEditorKit rtf = new RTFEditorKit();
Document doc = rtf.createDefaultDocument();
rtf.read(new InputStreamReader(new FileInputStream("Document.rtf"), "windows-1256"),doc,0);
System.out.println(doc.getText(0,doc.getLength()));
and the wrong output is:
ÊÑÌãÉ: ÓãíÑ ÇáãÌÐæÈ
Try RTFParserKit, this should correctly support encodings like the ones you describe.
Here is the text it extracted from your example:
ترجمة: سمير المجذوب
I used the RtfDump
class which ships with RTFParserKit to dump the RTF content into an XML file. The class invokes the StandardRtfParser
on the supplied input file, while the RtfDumpListener
class receives the events raised by the parser as the file is read, adding content to the XML file as it goes.