Getting full text out of acronyms in TweetNLP

TweetNLP provides tokenizer and part-of-speech tagger for tweets, which is really cool. Now, I wonder if I can take it a step further and extract acronyms. For example, when I get a tweet "ikr", I would be able to look it up and get "I know, right?". I guess I can write my own dictionary, but it seems that there should already be one?

Solution

So what I end up doing is to use StanfordNLP with GATE tweeter model.

Sample tweet:

ikr smh he asked fir yo last name so he can add u on fb lololol

Results without gate-EN-twitter.model

word: ikr :: pos: NN :: ne:O
word: smh :: pos: NN :: ne:O
word: he :: pos: PRP :: ne:O
word: asked :: pos: VBD :: ne:O
word: fir :: pos: NNP :: ne:O
word: yo :: pos: NNP :: ne:O
word: last :: pos: JJ :: ne:O
word: name :: pos: NN :: ne:O
word: so :: pos: IN :: ne:O
word: he :: pos: PRP :: ne:O
word: can :: pos: MD :: ne:O
word: add :: pos: VB :: ne:O
word: u :: pos: NN :: ne:O
word: on :: pos: IN :: ne:O
word: fb :: pos: NN :: ne:O
word: lololol :: pos: NN :: ne:O

Results with gate-EN-twitter.model

word: ikr :: pos: UH :: ne:O
word: smh :: pos: UH :: ne:O
word: he :: pos: PRP :: ne:O
word: asked :: pos: VBD :: ne:O
word: fir :: pos: IN :: ne:O
word: yo :: pos: PRP$ :: ne:O
word: last :: pos: JJ :: ne:O
word: name :: pos: NN :: ne:O
word: so :: pos: IN :: ne:O
word: he :: pos: PRP :: ne:O
word: can :: pos: MD :: ne:O
word: add :: pos: VB :: ne:O
word: u :: pos: PRP :: ne:O
word: on :: pos: IN :: ne:O
word: fb :: pos: NNP :: ne:O
word: lololol :: pos: UH :: ne:O

Now, I am able to identify slang by looking at the tag of UH and go against my custom dictionary.

Still puzzled why it was not already available out there, but it solves my issue at the moment.