I am indexing files (pdf, docx etc.) with SolrJ and after that I want to add a Listener to the directory of files to check for changes. The problem is that after I indexed the files (which seems to work fine) I cannot delete or move any of the files.
I am running the code on an Azure Function (microservice).
There seems to be a difference between adding all files to one request and then send the request to Solr or only sending one request for every file. Sending all files in one batch seems to workmost of the time and I can delete and move the local files afterwards, but when I send one request per file the problem always occurs.
This code is working most of the time:
public String indexFiles(String folderPath) throws IOException, SolrServerException, NullPointerException {
File folder = new File(folderPath);
ContentStreamUpdateRequest request = new ContentStreamUpdateRequest("/update/extract");
request = addFilesToRequest(folder, request);
NamedList resp = client.request(request);
client.commit();
return resp.toString();
}
private ContentStreamUpdateRequest addFilesToRequest(final File folder, ContentStreamUpdateRequest request) throws IOException{
File[] listOfFiles = folder.listFiles();
if(listOfFiles != null) {
for (File file : listOfFiles) {
if (file.isFile()) {
System.out.println("file is: " + file.getName());
request.addFile(file, getContentType(file));
}else{
request = addFilesToRequest(file, request);
}
}
}
return request;
}
while this code always seems to block the files:
public void indexFiles(String folderPath) throws IOException, SolrServerException, NullPointerException {
File folder = new File(folderPath);
System.out.println("Starting indexing.");
addFilesToSolr(folder);
System.out.println("add files to Solr done.");
}
private void addFilesToSolr(final File folder) throws IOException, SolrServerException, NullPointerException{
File[] listOfFiles = folder.listFiles();
if(listOfFiles != null) {
for (File file : listOfFiles) {
if (file.isFile()) {
ContentStreamUpdateRequest request = new ContentStreamUpdateRequest("/update/extract");
System.out.println("file is: " + file.getName());
request.addFile(file, getContentType(file));
System.out.println("Sending request ...");
NamedList response = client.request(request);
System.out.println("Request send. Response is: " + response);
UpdateResponse resp = client.commit();
System.out.println("Committed. Response is: " + resp.getResponse().toString());
} else if (file.isDirectory()) {
addFilesToSolr(file);
}
}
}
System.out.println("Requests for all files created. End of method.");
}
If anyone else encunters the problem, have a look at this: Files locked after indexing Apparently it is a bug in the SolrJ library where the file stream is not closed properly. The bug is apparently fixed, but the issue still occurs, so there is a workaround in the link that solved the issue for me.