I am trying to update an existing spacy model "en_core_web_sm" with some different country currency such as "euro", "rupees", "eu", "Rs.", "INR" etc. How can I achieve that ? The spacy tutorial didn't quite help me as training a fixed string such as "horses" as "ANIMAL" seems different than my requirements. The reason is I can have currency value indifferent formats : "1 million euros", "Rs. 10,000", "INR 1 thousand" etc. My sample dataset contains around 1000 samples with the following format :
TRAIN_DATA = [
(" You have activated International transaction limit for Debit Card ending XXXX1137 on 2017-07-05 12:48:20.0 via NetBanking. The new limit is Rs. 250,000.00", {'entities' : [(140, 154, 'MONEY')] }),...
]
Can anyone please help me out with this with the data format, training size or any other relevant information ?
The example from the documentation should work for you. I altered it a little to match your variable name.
optimizer = nlp.begin_training()
for itn in range(100):
random.shuffle(train_data)
for raw_text, entity_offsets in TRAIN_DATA:
doc = nlp.make_doc(raw_text)
gold = GoldParse(doc, entities=entity_offsets)
nlp.update([doc], [gold], drop=0.5, sgd=optimizer)
nlp.to_disk('/model')