I've been surfing on stackoverflow looking for a solution to this problem:
I've tried differents approches, unfortunately I didn't solve it.
Scenario:
I have a python script that reads from Google Pubsub, via Apache Beam, messages. Every received message, I call a procedure that inserts into a PostgreSQL table:
My method calls a PostgreSQL Stored Procedure using a psycopg2 connection:
Running my code on DirectRunner, it works fine. When I run it on Dataflow I got the message :
ModuleNotFoundError: No module named 'psycopg2'.
Can someone help me, please?
Thank you, Juliano
Dataflow runner is a temporary cluster in Google Cloud Platform. Dataflow runner does not have your local libraries. You need to specify --requirements_file requirements.txt
on the command line when you run your dataflow.