I'm training a few Machine learning models that represent words as vectors, using freebase as training data. Since the API has been deprecated, I'm working with raw freebase dump, which is now a list of 3.1 billion triples, containing more than 500 million distinct entities (subject/object), and I'd like to reduce this number.
I would like to remove all triples which simply denote names of subjects so that only triples containing MIDs remain. However, I've found multiple possible predicates that define the 'name' of an entity.
i) common.notable_for.display_name
ii) type.object.name
iii) /rdf-schema#label
I have 3 questions :
a) Is there any difference between the above predicates?
b) Are there any additional predicates which also describe the names of entities?
c) Apart from the triple where a name is defined, does the name ever appear in other triples, instead of the MID?
Thank you for your help!
You should only concentrate on the type.object.name
that's the schema property holding the topic's name.
The /rdf-schema#label
is equalization, it is not part of the freebase schema.
The common.notable_for.display_name
description is: "Localized/gender appropriate display name for the notable object.", it is also a property within a CVT (compound value type) and it holds different type of information: "of all types that a topic has, what't it most "important". As far as I remember "Larry Page" was an "entrepreneur". So you don't need this property. Concentrate on the TON type.object.name
.