Search code examples
weaviate

Choosing the right name for properties in schema for Weaviate


When loading my schema into Weaviate, I get an error message that the property name can not be found in the contextionary. Some of the properties I need are abbreviations.

This is the schema item it is complaining about:

{
    "cardinality": "atMostOne",
    "dataType": [
       "boolean"
    ],
    "description": "Is this a BLWS elbow yes or no",
    "keywords": [
        {
            "keyword": "BLWS",
            "weight": 1
        }
    ],
    "name": "blws"
}

This is the error message I get:

2019-09-04T11:47:07.202646 ERROR: {'error': [{'message': "Could not find the word 'blws' from the property 'blws' in the class name 'Elbow' in the contextionary. Consider using keywords to define the semantic meaning of this class."}]}


Solution

  • The misleading error

    The error message

    Consider using keywords to define the semantic meaning of this class
    

    is outdated and and the recommendation in fact not helpful. There is already a GitHub issue to clean this up: https://github.com/semi-technologies/weaviate/issues/929

    Prior to https://github.com/semi-technologies/weaviate/issues/856 it was possible to replace an unknown property word with known keywords, but #856 removed that possibility.

    However, even prior to the change your schema would not have been accepted, see below.

    About property names which are not in the contextionary

    A property name consist of one or more recognized parts which is known by the contextionary. By "part" I mean that if you combine multiple words using camelCasing each word would be one part. So for example

    • drivesVehicle would be valid as it consists of two known words: drives, vehicle
    • drivesAVehicle would also be valid, as it contains two known words and a stop word (a). Note: Stopwords are fine as long as your property contains at least one non-stopword.
    • drivesBlws would be invalid, as blws is not a known word

    We have discussed adding an ability to add custom words. The proposal can be considered accepted, but at the time of this writing it is not in immediate prioritization.

    Why so strict about known words?

    One of the core functionalities of weaviate is concept searching ("vector-based searching"), so weaviate must be able to calculate a vector position for each property. It can only do that if it recognizes the words

    How to solve this?

    Try describing a "blws" with known words. For example if "blws" was an acronoym for "bold long wide short", you could name the property boldLongWideShort. As mentioned above, we will add the ability to add custom words in the future, but as of now that's not supported yet.