I have the contents of what a feed is sending to the search appliance for indexing, but one XML node is base64compressed. Looking at the GSA docs' custom feed are to be constructed by compressing (zlib) and then encoding them. I tried to reverse the process by decoding and then using 7zip to open it but it did not work.
Rationale: I am looking at this is as GSA is EOL, we are moving to Solr but will continue to use some GSA Connectors for the time being (they are open source). I need to look at the text contents of what gets indexed to the search appliance so I can construct a proper Solr schema.
My experience with GSA is very minimal so I may be thinking about this all wrong, would appreciate any suggestions on how to tackle this.
Thanks!
This code will decode then uncompress the base64compressed item in a GSA feed.
private byte[] decodeUncompress(byte[] data) throws IOException {
// Decode
byte[] decodedBytes = Base64.getDecoder().decode(data);
// Uncompress
ByteArrayOutputStream stream = new ByteArrayOutputStream();
Inflater decompresser = new Inflater(false);
InflaterOutputStream inflaterOutputStream = new InflaterOutputStream(stream, decompresser);
try {
inflaterOutputStream.write(decodedBytes);
} catch (IOException e) {
throw e;
} finally {
try {
inflaterOutputStream.close();
} catch (IOException e) {
}
}
return stream.toByteArray();
}