Search code examples
solrmultivalue

preserve association in multivalued in solr


I have multivalued fields in my solr datasource. sample is

<doc>
<str name="id">23606</str>
<arr name="institution">
    <str>Harvard University</str>
    <str>Yale Universety</str>
    <str>Cornell University</str>
    <str>TUFTS University</str>
    <str>University of Arizona</str>
</arr>
<arr name="degree_level">
    <str>Bachelors</str>
    <str>Diploma</str>
    <str>Master</str>
    <str>Master</str>
    <str>PhD</str>
</arr>
</doc>

in the example above this user has got Bachelors degree from Harvard, Diploma from Yale, Master from Cornell, Master from TUFTS, and PhD from Arizona. now if i search for users who have Bachelors degree and graduated from Harvard, i will get this user, which is correct. MyDomain:8888/solr/mycol/select?facet=true&q=:&fq=degree_level:Bachelors&fq=institution:Harvard+University

but if i want those who have Bachelors from Cornell, i will get this user as well, which is incorrect! MyDomain:8888/solr/mycol/select?facet=true&q=:&fq=degree_level:Bachelors&fq=institution:Cornell+University
The question is: how could i preserve ordering/mapping in multivalued in solr?
Edit:
By the way, i know that i can solve my problem by creating new field to contain concatenation of the degree with university (ie, "Bachelors_Harvard University", "Diploma_Yale Universety", and so on) but i need a solution based on solr core itself as i have a lot of multivalued fields with a lot of combinations.


Solution

  • Below is a list of some suggestions

    • try using dynamic fields
      <dynamicField name="degree_level_*" type="string" indexed="true" stored="true" />
      and create fields dynamically while indexing degree_level_Bachelors with value Harward University and so on. so when you want to filter on Bachelors degree, filter on field degree_level_Bachelors. Similarly, if you want to allow filtering on institutions, create a dynamic field for institutions.
    • you can pre define how you will be storing data: <year><seperator><degree><seperator><institution><seperator><Major> etc etc.
      and then filter on the reqired regex.
      eg:
      fq=educationDetails:2009@Bachelors@Harvard@*
      this will give you all records with bachelors from Harvard in 2009. you will have to come up with the regex expressions for all the different filters.
    • two collections to correctly model the one-to-many relationship between user and degree queried using {!join}
    • one collection at a "user-degree" level of granularity that gets deduped via Solr's field collapsing support.