I am trying to take the data from inside xml file by using xsltproc but when I execute xsltproc I get a list of parser errors showing me :
new_news.xml:388407: parser error : PCDATA invalid Char value 26
new_news.xml:418521: parser error : PCDATA invalid Char value 26
new_news.xml:1490882: parser error : PCDATA invalid Char value 27 ultan'ın
The numbers in the error list show me the line numbers of my xml file where error occurs and I saw that there are some non-UTF8 chars like ESC, SUB, etc.. (the .xml file has UTF-8 tag at the beginning.) And since this is the case I need to remove(or replace) those non-UTF8 chars. To do this:
I used iconv command :
iconv -c -t UTF-8 < new.xml > new_news.xml
then used diff command to see the difference:
diff new.xml new_news.xml
But there is no difference between them. Hence I get the same error given the new_news.xml to xsltproc command.
Could you please help me to solve this? What am I doing wrong? By the way I am using OsX terminal. I don't know if iconv command makes difference as in the sed and awk commands.
Best Regards
Your problem is not with UTF-8, but with XML. Non-printing characters such as ESC or SUB are not allowed in XML. If your file contains them, then it's not an XML document.
You need to either remove the offending characters or change them to something else before your document can be parsed as XML and processed by an XSLT processor. Changing the document's encoding will not accomplish anything.