We'd like to index and store a group of Word documents in Solr and have them appear as elements of a multivalued text field, with the content of each appearing as an element under that one entry in the index. In other words, it looks like this
We don't want each document indexed to have its own unique ID; a group of documents will be children of a particular ID. There can be any number of documents for that ID. How to do this?
UPDATE: Here's my C# code; how would I read multiple documents into this for the unique ID being set with the (++count).ToString()
?
using (FileStream fileStream = File.OpenRead(path))
{
solr.Extract(
new ExtractParameters(fileStream, (++_count).ToString())
{
ExtractFormat = ExtractFormat.Text,
ExtractOnly = false,
Fields = new List<ExtractField>()
{
new ExtractField("action", actionTo),
new ExtractField("actiondate", actionDate),
new ExtractField("abstract", abstract),
new ExtractField("docval", docval),
new ExtractField("documentgeo",documentgeo),
new ExtractField("filename", filename),
new ExtractField("isprimary", IsPrimary.ToString())
},
AutoCommit = true
}
);
}
In your SOLR schema define two fields - id
and text
. text
should be multivalued. Then aggregate in your SolrInputDocument
s the text data for the id and index.
<field name="id" type="int" multiValued="false" stored="true" indexed="true" />
<field name="text" type="text" multiValued="true" stored="true" indexed="true" />
I don't know the c#
API, but using SolrJ it is fairly easy to aggregate using SolrInputDocument.addField("fieldname", "value")
.
Example update
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", 1)
for (String docText : documents){
doc.addField("text", docText)
}
Example .NET update
I would define my class in the following way:
public class Document{
[SolrUniqueKey("id")]
public integer Id { get; set; }
[SolrField("text")]
public ICollection<string> texts { get; set; }
Then I will populate it and submit with something like this pseudo-.NET code:
Document doc = new Document();
for (String documentPath : paths) {
using (FileStream fileStream = File.OpenRead(path)) {
string id = fileStream.getId();
if (doc.getId() == id){
doc.getTexts.add(fileStream.getText())
}
}
}
var solr = ServiceLocator.Current.GetInstance<ISolrOperations<Document>>();
solr.Add(doc);
solr.Commit();