Search code examples
pythonnlpspacynamed-entity-recognition

How to re-train an existing spacy NER model for currency


I am trying to update an existing spacy model "en_core_web_sm" with some different country currency such as "euro", "rupees", "eu", "Rs.", "INR" etc. How can I achieve that ? The spacy tutorial didn't quite help me as training a fixed string such as "horses" as "ANIMAL" seems different than my requirements. The reason is I can have currency value indifferent formats : "1 million euros", "Rs. 10,000", "INR 1 thousand" etc. My sample dataset contains around 1000 samples with the following format :

TRAIN_DATA = [      
 (" You have activated International transaction limit for Debit Card ending XXXX1137 on 2017-07-05 12:48:20.0 via NetBanking. The new limit is Rs. 250,000.00", {'entities' : [(140, 154, 'MONEY')] }),...
]

Can anyone please help me out with this with the data format, training size or any other relevant information ?


Solution

  • The example from the documentation should work for you. I altered it a little to match your variable name.

    optimizer = nlp.begin_training()
    
    for itn in range(100):
        random.shuffle(train_data)
        for raw_text, entity_offsets in TRAIN_DATA:
            doc = nlp.make_doc(raw_text)
            gold = GoldParse(doc, entities=entity_offsets)
            nlp.update([doc], [gold], drop=0.5, sgd=optimizer)
    nlp.to_disk('/model')
    

    Link to Documentation