Search code examples
nlpspacynamed-entity-recognition

Tabular data using spacy


I'm using Spacy and need some help to train our model with custom entities given in tabular format in a word/pdf document.

I'm able to train it with a custom entity based on an example of ANIMAL and it's working fine. In this case, we are providing the start and the end index of the aforementioned custom entity in a given text.

("Horses are too tall and they pretend to care about your feelings", {
    'entities': [(0, 6, 'ANIMAL')]
}),

My question comes in case of Tabular format:
How can I give indexes like ANIMAL example?
Can anyone please guide and assist?

enter image description here


Solution

  • After a lots of research and article, I found a way to pass it through.

    1. Convert this table as text.
    2. As you convert this as text. this will add lots of white spaces etc.
    3. Replace them with spaces.
    4. This will convert you table as paragraph.
    5. Now you can give indexes as sentences, and train your model.

    Further, you can use dependency parser algorithm to find correct values linked with head ( in case, a values belongs to multiple key)