Hi i am trying to tag the words in a sentence in order. For example, (my initial method)
Sentence: Work across a wide range of related areas
Label: Tag O O O O O Tag Tag
But now i need it to be like this where it can tag 2 words as a keyword aand label it together:
Sentence: Work across a wide range of related areas
Label: Tag O O O O O Tag
I have a list of keyword of varying length and their tags. How can i tag the way i need it to be in the sentence order?
Looks like what you are looking for is the BIO-tagging system (If I understood you correctly, and you are looking for a solution in manually tagged corpora).
BIO denotes the following: B - beginning of a chunk, I - the inside of the chunk, O - is a token outside of a chunk.
Step 1
Sentence: Work across a wide range of related areas
Tag: B O O O O O B I
Label: Label_1 O O O O O Label_2 Label_2
Step 2
Sentence: Work across a wide range of related areas
Label: B-Label_1 O O O O O B-Label_2 I-Label_2
Once you have tagged your corpus, you will align the lists of Sentences (list #1) and Tag + Label combos (list #2): the BIO tags will be prefixed to your labels, e.g., [...related, areas] + [... B-Label_2, I-Label_2]. That way you can combine [B-Label_2, I-Label_2] into one Label_2 since you have a pattern of BI together. You will just have to strip the prefixes at the very end and do a lot of other intermediate steps and post-processing.