Search code examples
python-3.xnlpdata-sciencespacyrasa-nlu

Substitute multiple word with single entity in chat text dataset


I have a chat data of shape 500k rows. I want to replace or substitute multiple words entity [eg. NEW YORK, New York, new york, Newyork] with single entity as "New York" using python.

I tried to do this using regex, but it consumes too much time for processing. Also I have many such words. Is there any alternative method which consumes less time using Python?

Is there any good resource to study more about Spacy and Rasa API?


Solution

  • You can provide, some simple example of you need to do? I mean example using some training object. You need to change the entity name or entity value?

    About more docs to study rasa and spacy, both has a good documentations on his own domains(site/github).

    About Rasa, you can find good things here:

    1. https://rasa.com/docs/nlu/
    2. https://medium.com/rasa-blog
    3. https://forum.rasa.com/

    About SpaCy:

    1. https://spacy.io/usage/
    2. https://explosion.ai/blog/

    Also, you can find more real examples on medium's posts