Search code examples
pythonavroconfluent-schema-registrymemgraphdb

Adding packages to memgraph transformation


I am writing a memgraph transformation in python.

When I import modules such as "requests" or "networkx", the transformation works as expected.

I have avro data w/ schema registry, so I need to deserialize it. I followed the memgraph example here: https://memgraph.com/docs/memgraph/2.3.0/import-data/kafka/avro#deserialization

When I save the transformation with those imports, I receive the error:

[Error] Unable to load module "/memgraph/internal_modules/test_try_plz.py";
Traceback (most recent call last): File "/memgraph/internal_modules/test_try_plz.py", line 4,
in <module> from confluent_kafka.schema_registry import SchemaRegistryClient ModuleNotFoundError:
No module named 'confluent_kafka' . For more details, visit https://memgr.ph/modules.

How can I update my transform or memgraph instance to include the confluent_kafka module?

The link provided in the answer did not provide any leads, at least to me.


Solution

  • You cannot add python dependencies to memgraph in memgraph cloud (free trial at least..)

    Instead, create your own docker image and use that, e.g.,

    FROM memgraph/memgraph:2.5.2
    
    USER root
    
    # Install Python
    RUN apt-get update && apt-get install -y \
        python3-pip \
        python3-setuptools \
        python3-dev \
        && pip3 install -U pip
    
    # Install pip packages
    COPY requirements.txt ./
    RUN pip3 install -r requirements.txt
    
    # Copy the local query modules
    COPY transformations/ /usr/lib/memgraph/query_modules
    COPY procedures/ /usr/lib/memgraph/query_modules
    
    USER memgraph
    

    And my requirements.txt, all of which are required for a transformation leveraging the confluent schema-registry/avro packages:

    confluent_kafka==2.0.2
    fastavro==1.7.1
    requests==2.28.2