Search code examples
rsspentahokettlepentaho-spoon

Occasional "Premature end of file" error while running RSS Input in kettle?


In pentaho kettle, I configured the RSS Input step with some URLs. When I run the transformation, it runs perfect most of the times but sometimes, it shows the following error:

2016/06/29 13:10:48 - RSS Input.0 - ERROR (version 6.0.1.0-386, build 1 from 2015-12-03 11.37.25 by buildguy) : Unexpected Exception : it.sauronsoftware.feed4j.FeedXMLParseException: org.dom4j.DocumentException: Error on line -1 of document  : Premature end of file. Nested exception: Premature end of file.
2016/06/29 13:10:48 - RSS Input.0 - ERROR (version 6.0.1.0-386, build 1 from 2015-12-03 11.37.25 by buildguy) : it.sauronsoftware.feed4j.FeedXMLParseException: org.dom4j.DocumentException: Error on line -1 of document  : Premature end of file. Nested exception: Premature end of file.
2016/06/29 13:10:48 - RSS Input.0 -     at it.sauronsoftware.feed4j.FeedParser.parse(FeedParser.java:53)
2016/06/29 13:10:48 - RSS Input.0 -     at org.pentaho.di.trans.steps.rssinput.RssInput.readNextUrl(RssInput.java:168)
2016/06/29 13:10:48 - RSS Input.0 -     at org.pentaho.di.trans.steps.rssinput.RssInput.getOneRow(RssInput.java:198)
2016/06/29 13:10:48 - RSS Input.0 -     at org.pentaho.di.trans.steps.rssinput.RssInput.processRow(RssInput.java:312)
2016/06/29 13:10:48 - RSS Input.0 -     at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2016/06/29 13:10:48 - RSS Input.0 -     at java.lang.Thread.run(Thread.java:745)
2016/06/29 13:10:48 - RSS Input.0 - Caused by: org.dom4j.DocumentException: Error on line -1 of document  : Premature end of file. Nested exception: Premature end of file.
2016/06/29 13:10:48 - RSS Input.0 -     at org.dom4j.io.SAXReader.read(SAXReader.java:482)
2016/06/29 13:10:48 - RSS Input.0 -     at org.dom4j.io.SAXReader.read(SAXReader.java:291)
2016/06/29 13:10:48 - RSS Input.0 -     at it.sauronsoftware.feed4j.FeedParser.parse(FeedParser.java:37)
2016/06/29 13:10:48 - RSS Input.0 -     ... 5 more

I have used the default RSS Input step that comes with kettle, and here is the screenshot:

enter image description here

And the links that I have configured in RSS feed are:

enter image description here

How to resolve this issue? Even when I run the RSS feed on one of the links, it shows the same error occasionally. Is there some problem with this plugin?


Solution

  • If it is really necessary manually adjust source code.

    Just get source of feed4j. It is quiet old, so there is just single version.

    Open file in editor it.sauronsoftware.feed4j.FeedParser.java

    It has single method parse

    public static Feed parse(Url url){
        SAXReader saxReader = new SAXReader();
        Document document = saxReader.read(url);
        ...
    

    Good staff SAXReader has several overloaded method, one on them what u need

       saxParser.read(InputStream is)
    

    Instead of passing url to method read, just write code to read data from url using httpclient (good news it is bundled with kettle-pdi but to clarify version look into $KETTLE-HOME/lib/commons-httpclient-x.x.jar)

    Then wrap received from server by httpclient data into ByteArrayInputSteam and pass it into SaxReader

    Build library and replace feed4j-1.0.jar with yours

    And u are done.

    code will something like this

    public static Feed parse(Url url){
        SAXReader saxReader = new SAXReader();
        CloseableHttpClient client = HttpClients.createDefault();
        HttpGet get = new HttpGet(url);
        CloseableHttpResponse response = client.execute(get);
        HttpEntity entity = response.getEntity();
        byte[] b = new byte[(int)entity.getContentLength()];
        entity.getContent().read(b);
        InputStream is = new ByteArrayInputStream(b);
    
        Document document = saxReader.read(is);
        ...
    

    Extra details

    • Might need to add code to wrap possible IOException to FeedXMLParseException
    • This code assume that server post Content-Length header in response
    • Use matching jdk version