I have a set of urls in a text file. For each url in that text file, I want to tag the entities and relationships in the text contained in that url.
I am aware of the entity taggers like Stanford NER, NLTK and GATE which can perform the entity tagging. However, I am more interested in relationship extraction.
In order to extract relationships, I am thinking of annotating the text contained in those urls for training purpose. For this, I do not want to do manual annotation. I can write few regex to extract the relationship which I want, however it would be difficult to scale up.
Is there a tool where in I can specify what I want to annotate?
For example:
" Rob is working as the Director of ABC organization. He graduated from XYZ University "
Here, I want to extract the affiliations relationship, so intuitively I would like to annotate words which describe the affiliations like working, graduated.
Edit: By "a set of URLs in the text file", I mean I have about 200 links to certain webpages in that text file, each of the webpage contains some text. I want to analyse (annotate) that text.
There is no PR in GATE that that will pair arguments and create instances for you. You must therefore create instances that are relevant to your problem.
You can:
You can probably split your corpus on a training and a test dataset.
You can use the GATE training course about Relation Extration that contains all you need: