Search code examples
nlpstanford-nlptweets

Getting full text out of acronyms in TweetNLP


TweetNLP provides tokenizer and part-of-speech tagger for tweets, which is really cool. Now, I wonder if I can take it a step further and extract acronyms. For example, when I get a tweet "ikr", I would be able to look it up and get "I know, right?". I guess I can write my own dictionary, but it seems that there should already be one?


Solution

  • So what I end up doing is to use StanfordNLP with GATE tweeter model.

    Sample tweet:

    ikr smh he asked fir yo last name so he can add u on fb lololol

    Results without gate-EN-twitter.model

    word: ikr :: pos: NN :: ne:O
    word: smh :: pos: NN :: ne:O
    word: he :: pos: PRP :: ne:O
    word: asked :: pos: VBD :: ne:O
    word: fir :: pos: NNP :: ne:O
    word: yo :: pos: NNP :: ne:O
    word: last :: pos: JJ :: ne:O
    word: name :: pos: NN :: ne:O
    word: so :: pos: IN :: ne:O
    word: he :: pos: PRP :: ne:O
    word: can :: pos: MD :: ne:O
    word: add :: pos: VB :: ne:O
    word: u :: pos: NN :: ne:O
    word: on :: pos: IN :: ne:O
    word: fb :: pos: NN :: ne:O
    word: lololol :: pos: NN :: ne:O
    

    Results with gate-EN-twitter.model

    word: ikr :: pos: UH :: ne:O
    word: smh :: pos: UH :: ne:O
    word: he :: pos: PRP :: ne:O
    word: asked :: pos: VBD :: ne:O
    word: fir :: pos: IN :: ne:O
    word: yo :: pos: PRP$ :: ne:O
    word: last :: pos: JJ :: ne:O
    word: name :: pos: NN :: ne:O
    word: so :: pos: IN :: ne:O
    word: he :: pos: PRP :: ne:O
    word: can :: pos: MD :: ne:O
    word: add :: pos: VB :: ne:O
    word: u :: pos: PRP :: ne:O
    word: on :: pos: IN :: ne:O
    word: fb :: pos: NNP :: ne:O
    word: lololol :: pos: UH :: ne:O
    

    Now, I am able to identify slang by looking at the tag of UH and go against my custom dictionary.

    Still puzzled why it was not already available out there, but it solves my issue at the moment.