Following an upgrade of our crawler from StormCrawler 1.8 to 1.14 we have noticed that response type of our WARC entries had changed from "WARC-Type: response" to "WARC-Type: resource". Any suggestion on how to switch back to "WARC-Type: response"?
Nothing has changed in the WARCRecordFormat between 1.8 and 1.14 - if there is a verbatim HTTP response header available, a response record is written. If there is no HTTP header, a WARC resource record is used instead.
In order to store the HTTP headers, the following configuration is required:
http.store.headers: true
http.protocol.implementation: com.digitalpebble.stormcrawler.protocol.okhttp.HttpProtocol
https.protocol.implementation: com.digitalpebble.stormcrawler.protocol.okhttp.HttpProtocol
More information is found in the README of the WARC module.