Search code examples
mongodbrasa-nlurasa-core

NoSQL Injections with Rasa


Security Concern

I have recently been playing around with Rasa and having MongoDB as the database behind.

I was wondering whether one should preprocess the inputs to Rasa somehow in order to prevent NoSQL injections? I tried putting a Custom Component as a part of Rasa NLU Pipeline, but as soon as something hits the first element of the NLU pipeline, it seems that the original text is also saved in Mongo.

domain_file

language: "de"

pipeline:
- name: "nlu_components.length_limiter.LengthLimiter"
- name: "tokenizer_whitespace"
- name: "intent_entity_featurizer_regex"
- name: "ner_crf"
- name: "ner_synonyms"
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"

length_limiter.py - notice the "process" method

from rasa_nlu.components import Component


MAX_LENGTH = 300


class LengthLimiter(Component):
"""
This component shortens the input message to MAX_LENGTH chars
in order to prevent overloading the bot

"""

# Name of the component to be used when integrating it in a
# pipeline. E.g. ``[ComponentA, ComponentB]``
# will be a proper pipeline definition where ``ComponentA``
# is the name of the first component of the pipeline.
name = "LengthLimiter"

# Defines what attributes the pipeline component will
# provide when called. The listed attributes
# should be set by the component on the message object
# during test and train, e.g.
# ```message.set("entities", [...])```
provides = []

# Which attributes on a message are required by this
# component. e.g. if requires contains "tokens", than a
# previous component in the pipeline needs to have "tokens"
# within the above described `provides` property.
requires = []

# Defines the default configuration parameters of a component
# these values can be overwritten in the pipeline configuration
# of the model. The component should choose sensible defaults
# and should be able to create reasonable results with the defaults.
defaults = {
    "MAX_LENGTH": 300
}

# Defines what language(s) this component can handle.
# This attribute is designed for instance method: `can_handle_language`.
# Default value is None which means it can handle all languages.
# This is an important feature for backwards compatibility of components.
language_list = None

def __init__(self, component_config=None):
    super(LengthLimiter, self).__init__(component_config)

def train(self, training_data, cfg, **kwargs):
    """Train this component.

    This is the components chance to train itself provided
    with the training data. The component can rely on
    any context attribute to be present, that gets created
    by a call to :meth:`components.Component.pipeline_init`
    of ANY component and
    on any context attributes created by a call to
    :meth:`components.Component.train`
    of components previous to this one."""
    pass

def process(self, message, **kwargs):
    """Process an incoming message.

    This is the components chance to process an incoming
    message. The component can rely on
    any context attribute to be present, that gets created
    by a call to :meth:`components.Component.pipeline_init`
    of ANY component and
    on any context attributes created by a call to
    :meth:`components.Component.process`
    of components previous to this one."""

    message.text = message.text[:self.defaults["MAX_LENGTH"]]

def persist(self, model_dir):
    """Persist this component to disk for future loading."""

    pass

@classmethod
def load(cls, model_dir=None, model_metadata=None, cached_component=None,
         **kwargs):
    """Load this component from file."""

    if cached_component:
        return cached_component
    else:
        component_config = model_metadata.for_component(cls.name)
        return cls(component_config)

Solution

  • I played around the mongo tracker store and could not inject anything. However, if you want to add your own component to go absolutely sure you'd have to implement your own input channel. There you can change the messages, before it is processed by Rasa Core.