Search code examples
machine-learningnlpdata-sciencenamed-entity-recognition

Distant Supervision: a rule-based labelling approach?


I am currently working on entity relations stuff and I found out that a lot of papers implemented distant supervision to label the data. What I understand about distant supervision is that we have an established Knowledge Base (KB) and we do kind of "rule-based labeling" by checking the extracted entity pairs whether they exist in the KB or not. If the entity pair exist in KB, it will be labelled as positive, otherwise it will be labelled as negative.

My questions are:

  1. Do I understand this distant supervision concept correctly?
  2. If yes, I don't understand why do we train neural networks to classify rule-based system? For example, if in the future we get new sentences that contain entities and we want to check if they have relation to each other, why don't we just refer back to the KB? Why do we train entity relation instead?

Thank you


Solution

  • Distant supervision is the approach of using rule based heuristics in order to produce labeled data, the labeled data produced being then used to train a model (generally a neural network).

    The Knowledge Base (KB) can be used can be used as a rule based heuristic. As stated by Nathan McCoy, the KB will generally not be complete and the model will enable you to detect a relation between to entities which are not present in the knowledge base.

    Snorkel is an example of a tool which was developped for distant supervision