I am loading lots of PDF documents in a Retrieve and Rank service but I do not know to to tell Solr or IBM Retrieve and Rank service that a specific part of my PDF document should be considered as a field for later query, for example, a name, or a document process id.
You can't do this when uploading documents using the web-based UI, as this only populates some default fields like body and title.
But you can programmatically add the contents of your PDF documents to the R&R collection. And when you do this, you're free to add any fields you want.
E.g. from the documentation at https://www.ibm.com/watson/developercloud/retrieve-and-rank/api/v1/?java#index_doc
RetrieveAndRank service = new RetrieveAndRank();
service.setUsernameAndPassword("{username}","{password}");
SolrInputDocument newdoc = new SolrInputDocument();
document.addField("id", 1);
document.addField("author", "brenckman,m.");
document.addField("bibliography", "j. ae. scs. 25, 1958, 324.");
etc...
UpdateResponse addResponse = solrClient.add("example_collection", newdoc);
solrClient.commit("example_collection");
In the same way that this example is using author
and bibliography
as additional field names, you can add new ones such as a process id.
You'll need to update the schema for your R&R collection to specify these new fields. You can use the schema at https://github.com/IBM-Watson/kale/blob/master/solr/knowledge-expansion-en.xml#L36 as an example for how to specify additional fields.