Search code examples
pythongraphqlweaviatevector-database

How to include cross-reference property in weaviate query?


The problem is that I don't know how to include cross-reference property wrotePublications in query result

I have 2 collections in weaviate database -- Researcher and Publication.

Schemas:

publication_class_schema = {
    "class": publication_class_name,
    "description": "Publication information",
    "properties": [
        {
            "name": "content",
            "dataType": ["text"],
            "description": "Content of the publication",
        },
        {
            "name": "title",
            "dataType": ["text"],
            "description": "Title of the publication",
        },
        . . .
        {
            "name": "citedby",
            "dataType": ["int"],
            "description": "Total number of citations of the publication",
        },
    ],
}

researcher_class_schema = {
    "class": researcher_class_name,
    "description": "Researcher profile information and statistics",
    "properties": [
        {
            "name": "name",
            "dataType": ["text"],
            "description": "The name of the researcher",
        },
        . . .
        {
            "name": "wrotePublications",
            "dataType": [publication_class_name],
            "description": "Cross-reference objects of publications writed by the researcher",
        },
    ],
}

I imported data by using weaviate-python-client as follows:

with client.batch as batch:
    researcher_obj_uuid = batch.add_data_object(researcher_obj, class_name=researcher_class_name)
    pub_obj_uuid = batch.add_data_object(publication_obj, class_name=publication_class_name)
    batch.add_reference(
        from_object_uuid=researcher_obj_uuid,
        from_object_class_name=researcher_class_name,
        from_property_name="wrotePublications",
        to_object_uuid=pub_obj_uuid,
        to_object_class_name=publication_class_name,
    )

Sample Researcher record:

{'class': 'Researcher',
 'creationTimeUnix': 1683744762093,
 'id': '9f1f09f7-dc73-4a08-a914-0f6df5c8fc3f',
 'lastUpdateTimeUnix': 1683744762298,
 'properties': {
   'affiliation': 'res univ',
   'citedby': 485,
   'name': 'res name',
   'wrotePublications': [
    {'beacon': 'weaviate://localhost/Publication/8884d468-2a3a-4344-ada9-07e1876c5364',
     'href': '/v1/objects/Publication/8884d468-2a3a-4344-ada9-07e1876c5364'}
  ]
},
 'vectorWeights': None}

Publication object from above record beacon:

{'class': 'Publication',
 'creationTimeUnix': 1683744762092,
 'id': '8884d468-2a3a-4344-ada9-07e1876c5364',
 'lastUpdateTimeUnix': 1683744762092,
 'properties': {'citedby': 2,
  'content': '',
  'pub_year': 1910,
  'title': 'pub title'},
 'vectorWeights': None}

I've tried code below but it returns an error message:

client.query
   .get(researcher_class_name, ["name", "wrotePublications"]) \
   .with_near_text(nearText) \
   .with_limit(3) \
   .with_additional(['certainty']) \
   .do()
{
  "errors": [
    {
      "locations": [
        {
          "column": 217,
          "line": 1
        }
      ],
      "message": "Field \"wrotePublications\" of type \"[ResearcherWrotePublicationsObj]\" must have a sub selection.",
      "path": null
    }
  ]
}

Solution

  • you should be able to do it with a syntax like this (if I understood your structure correctly).

    client.query
       .get(researcher_class_name, ["name wrotePublications {... on Publication {title} }"]) \
       .with_near_text(nearText) \
       .with_limit(3) \
       .with_additional(['certainty']) \
       .do()
    

    Take a look at this: https://weaviate.io/developers/weaviate/api/graphql/get#query-beacon-references

    And if you want to filter by references, you can do it by following examples shown here: https://weaviate.io/developers/weaviate/api/graphql/filters#beacon-reference-filters

    I hope that helps!