I've noticed solr 4.0 has a join function and I would like to use it to join subdocuments.
Something like
<book>
<bookid>1</bookid>
<Title>This book is epic</title>
</book>
<page>
<bookid>1</bookid>
<number>1</number>
<pagecontent>this is the first page of the epic book</pagecontent>
</page>
<page>
<bookid>1</bookid>
<number>2</number>
<pagecontent>this is the second page of the epic book</pagecontent>
</page>
How can I join these subdocuments?
I would like to query this like q=text:second .
Where text is a copyfield with all other fields in it.
The result should be the second page and its book. I have a more complex schema then just book and page. There are also other types of subdocuments with parent id:book.
In Solr 3.6 I'm storing all these subdocuments as multivaluefields and check if the combination exists with a concatination field. This isn't a good method and requires alot of coding + it relies on Strings.contains of Java. I hope solr 4.0 join can help me. But I dont understand how to write the correct query and how to retrieve results like book with list of pages.
I've also read about using multiple indexes for each subdoc type but I don't know how this would affect document scoring etc.
Edit:
Here they say they only put the results of the inner query in the final result. Should I do 2 queries with the id's changed and then combine the resutls? This also feels bad to me...
New answer: Index parent child and use blockjoin query. See blockjoin info
The answer below is old. New solr versions support blockjoin without need to write plugins etc.
I've used the lucenequerytimejoin(this join has scoring options on subdocuments) by making a queryparserplugin for solr.
This link explains a bit what I've done: Querytimejoin Solr
Here the quertimejoin gets explained by one of the lucene devs: Blog QueryTimeJoin
This solution does not support multiple cores.(the trunk solr join has this).