Search code examples
web-crawlerstormcrawler

StormCrawler SQL error for column 'nextfetchdate'


My setup is identical to this. When I run crawler in the crawl mode, I got the following error:

[Thread-130-status-executor[109 109]] ERROR c.d.s.p.AbstractStatusUpdaterBolt - Exception caught when storing com.mysql.jdbc.MysqlDataTruncation: Data truncation: Incorrect datetime value: '2099-12-31 00:00:00' for column 'nextfetchdate' at row 1 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3964) ~[stromcrawler-1.0-SNAPSHOT.jar:?] at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3902) ~[stromcrawler-1.0-SNAPSHOT.jar:?] at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2526) ~[stromcrawler-1.0-SNAPSHOT.jar:?] at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2673) ~[stromcrawler-1.0-SNAPSHOT.jar:?] at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2549) ~[stromcrawler-1.0-SNAPSHOT.jar:?] at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1861) ~[stromcrawler-1.0-SNAPSHOT.jar:?] at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1192) ~[stromcrawler-1.0-SNAPSHOT.jar:?] at com.digitalpebble.stormcrawler.sql.StatusUpdaterBolt.store(StatusUpdaterBolt.java:132) ~[stromcrawler-1.0-SNAPSHOT.jar:?] at com.digitalpebble.stormcrawler.persistence.AbstractStatusUpdaterBolt.execute(AbstractStatusUpdaterBolt.java:196) [stromcrawler-1.0-SNAPSHOT.jar:?] at org.apache.storm.daemon.executor$fn__5043$tuple_action_fn__5045.invoke(executor.clj:739) [storm-core-1.2.1.jar:1.2.1] at org.apache.storm.daemon.executor$mk_task_receiver$fn__4964.invoke(executor.clj:468) [storm-core-1.2.1.jar:1.2.1] at org.apache.storm.disruptor$clojure_handler$reify__4475.onEvent(disruptor.clj:41) [storm-core-1.2.1.jar:1.2.1] at org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:509) [storm-core-1.2.1.jar:1.2.1] at org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:487) [storm-core-1.2.1.jar:1.2.1] at org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:74) [storm-core-1.2.1.jar:1.2.1] at org.apache.storm.daemon.executor$fn__5043$fn__5056$fn__5109.invoke(executor.clj:861) [storm-core-1.2.1.jar:1.2.1] at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:484) [storm-core-1.2.1.jar:1.2.1] at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]

Please, advice where to look to fix it


Solution

  • This is due to the fetch interval for errors which has been set to -1, which means 'never revisit'. This is actually translated into a date long into the future by the DefaultScheduler. Not clear why mysql chokes on it. You could try setting a more reasonable value e.g. 43200 for a month and see if that works.