Search code examples
postgresqlpython-2.7google-cloud-dataflowpsycopg2apache-beam

ModuleNotFoundError: No module named 'psycopg2' - Dataflow


I've been surfing on stackoverflow looking for a solution to this problem:

Module not found error

I've tried differents approches, unfortunately I didn't solve it.

Scenario:

I have a python script that reads from Google Pubsub, via Apache Beam, messages. Every received message, I call a procedure that inserts into a PostgreSQL table:

Postgresql table

My method calls a PostgreSQL Stored Procedure using a psycopg2 connection:

enter image description here

Running my code on DirectRunner, it works fine. When I run it on Dataflow I got the message :

ModuleNotFoundError: No module named 'psycopg2'.

Can someone help me, please?

Thank you, Juliano


Solution

  • Dataflow runner is a temporary cluster in Google Cloud Platform. Dataflow runner does not have your local libraries. You need to specify --requirements_file requirements.txt on the command line when you run your dataflow.