I am on Oracle Commerce 11.1, on an application running with CAS only (without Forge).
Baseline update works fine. I have an issue with partial updates.
We have an extract file containing the subset of records that need to be updated. However, this file only lists a small subset of properties for each record (i.e. it only provides the properties that have actually changed).
When I do a partial update (using the default mechanism that comes with the CAS-only deployment template), it completes successfully but the records that were updated have only the subset of fields provided in the file - all of the fields that haven't changed are simply missing. It's as if CAS simply replaced the existing record (with the full set of properties) with a new record only containing the few properties in the extract file.
For example, say one of the records looks like this:
Record 23
---------
id 23
name Test
inventoryCount 23
buyable 1
imageUrl test.jpg
and say the partial extract file has an entry like this
Record 23
---------
id 23
inventoryCount 10
The result that I am getting after a partial update is this:
Record 23
---------
id 23
inventoryCount 10
I want to know how I can get CAS to preserve those properties instead of removing them. I know this was possible with Forge.
I've confirmed that there's not really an explicit mechanism to do this, so I invented my own.
To summarize how it works: I customized the PartialUpdate beanshell script so that, right after the last mile crawl runs, it invokes a custom-component I created called DGIDXTransformer (i.e. it extends CustomComponent). This class unzips and parses the file that the last-mile-crawl creates which is supposed to be fed into DGIDX and writes out a modified version of that file. Specifically, it modifies all of the update information so that the records will be updated instead of replaced with the new properties. This is a bit hacky because the format of the DGIDX input file is not documented, but according to my research that format is unlikely to change very drastically in future versions of Endeca.
Here's DGIDXTransformer:
import com.endeca.soleng.eac.toolkit.component.*;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import java.io.*;
import java.nio.file.AccessDeniedException;
import java.nio.file.Files;
import java.util.Map;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
/**
* Custom component which runs during the PartialUpdate beanshell script. It transforms the DGIDX-compatible input file
* that CAS produces so that records will be updated instead of replaced.
*
* Expects only one property entry called "dgidxInputFileDirectory", specifying the directory to look in to
* find the file to transform (relative to the config directory).
*
* @author chairbender
*/
public class DGIDXTransformer extends CustomComponent {
private static final String DGIDX_INPUT_FILE_DIRECTORY_PROPERTY_NAME = "dgidxInputFileDirectory";
private static final String RECORD_SPEC_PROPERTY_NAME = "record.spec";
/**
* Does the transformation as specified in the class javadoc.
*/
public void transformDGIDXInputFileToUpdateInsteadOfReplace() throws Exception {
//Find the file in the directory
Map<String, String> properties = getProperties();
if (null == properties || !properties.containsKey(DGIDX_INPUT_FILE_DIRECTORY_PROPERTY_NAME)) {
throw new Exception();
} else {
File directory = new File(properties.get(DGIDX_INPUT_FILE_DIRECTORY_PROPERTY_NAME));
File[] gzipFiles = directory.listFiles(new FilenameFilter() {
@Override
public boolean accept(File dir, String name) {
return name.endsWith(".xml.gz");
}
});
if (gzipFiles == null || gzipFiles.length == 0) {
throw new Exception();
} else {
File gzipFile = gzipFiles[0];
File unzippedFile = unzipFile(gzipFile);
transformInputFile(unzippedFile, unzippedFile.getAbsolutePath().replace(".xml", "transformed.xml"));
//delete the extra files in a way that throws an exception if deletion fails
Files.delete(gzipFile.toPath());
Files.delete(unzippedFile.toPath());
}
}
}
/**
* Gzips the passed file and saves it at the specified location
* @param toGzip file to gzip
* @param outputPath where to output the gzipped file
*
*/
private void gzipFile(File toGzip,String outputPath) throws IOException {
byte[] buffer = new byte[1024];
GZIPOutputStream gzipOutputStream =
new GZIPOutputStream(new FileOutputStream(outputPath,false));
FileInputStream inputStream =
new FileInputStream(toGzip);
int len;
while ((len = inputStream.read(buffer)) > 0) {
gzipOutputStream.write(buffer, 0, len);
}
inputStream.close();
gzipOutputStream.finish();
gzipOutputStream.close();
inputStream.close();
}
/**
*
* @param unzippedFile file representing DGIDX input data to transform
* @param transformedFilePath path where transformed file should go.
* @return the transformed file
*/
private File transformInputFile(File unzippedFile, String transformedFilePath) throws IOException {
File outputFile = new File(transformedFilePath);
//Since the XML and the transformation isn't very complicated, we'll just write it out line by line as we go through the
//unzipped file line-by-line
BufferedReader unzippedFileReader = new BufferedReader(new FileReader(unzippedFile));
BufferedWriter outputFileWriter = new BufferedWriter(new FileWriter(outputFile));
String nextLine;
while ((nextLine = unzippedFileReader.readLine()) != null) {
if (nextLine.contains("RECORD_ADD_OR_REPLACE")) {
//If the line contains RECORD_ADD_OR_REPLACE, need to change it to RECORD_UPDATE
outputFileWriter.write(nextLine.replace("RECORD_ADD_OR_REPLACE","RECORD_UPDATE"));
} else if (nextLine.contains("<PROP NAME=")) {
//if this line contains <PROP NAME="...">, and the property
//name isn't the record spec, we need to transform this element only if it isn't the record spec.
String propertyName = nextLine.split("\"")[1];
if (!propertyName.equals(RECORD_SPEC_PROPERTY_NAME)) {
//Read the property value from the next line
String propertyValueLine = unzippedFileReader.readLine();
String propertyValue = propertyValueLine.replace("<PVAL>","").replace("</PVAL>","").trim();
//Now write the PVAL_DELETE and PVAL_ADD entries
outputFileWriter.write("<PVAL_DELETE><PROPERTY_NAME NAME=\"" + propertyName + "\"/></PVAL_DELETE>");
outputFileWriter.write("<PVAL_ADD><PROP NAME=\"" + propertyName + "\"><PVAL>" + propertyValue + "</PVAL></PROP></PVAL_ADD>");
//Discard the closing element line of the input file
unzippedFileReader.readLine();
} else {
//it's not the record spec, so don't transform it.
outputFileWriter.write(nextLine);
}
} else {
//Just output the line
outputFileWriter.write(nextLine);
}
}
unzippedFileReader.close();
outputFileWriter.close();
return outputFile;
}
/**
*
* @param gzipFile file to un-gzip. Will create the un-gzipped version in the same directory as gzipFile,
* but without the ".gz" ending.
* @return the unzipped version of the file.
*/
private File unzipFile(File gzipFile) throws IOException {
//Un-gzip the file in one pass
GZIPInputStream gzipInputStream =
new GZIPInputStream(new FileInputStream(gzipFile));
File outputFile = new File(gzipFile.getAbsolutePath().replace(".gz",""));
FileOutputStream outputStream =
new FileOutputStream(outputFile);
int len;
byte[] buffer = new byte[1024];
while ((len = gzipInputStream.read(buffer)) > 0) {
outputStream.write(buffer, 0, len);
}
gzipInputStream.close();
outputStream.close();
return outputFile;
}
}
This is compiled into a JAR which goes in config/lib/java.
Here's the custom component definition in DataIngest.xml:
<custom-component id="DGIDXTransformer" host-id="ITLHost" class="com.chairbender.DGIDXTransformer">
<properties>
<property name="dgidxInputFileDirectory" value="../data/cas_output" />
</properties>
</custom-component>
And here's the relevant part of the custom PartialUpdate script:
CAS.runIncrementalCasCrawl("${lastMileCrawlName}");
DGIDXTransformer.transformDGIDXInputFileToUpdateInsteadOfReplace();
CAS.archiveDvalIdMappingsForCrawlIfChanged("${lastMileCrawlName}");