Search code examples
python-3.xplpython

PL/Python: trying to understand global dictionary


I am using pl/python3u in postgres. In that environment I have imported pandas and networkx. I have created a simple function that builds a graph from information in the tables and it works perfectly. I now want to create other functions that will work with the graph that I created.

The docs that seem to address want I want to do are here but I don't understand exactly what they are saying. They name two global dictionaries: SD and GD. My interpretation is that for say, my graph created by function_1 to be used/modified by another function it must be in GD. It is not clear to me if SD and GD are special to postgres pl/python and how to use them. What are good references/links to read to understand what these dictionaries are and how to use them?

Note 1: I've found this and this. The former shows how to use GD but doesn't explain anything about it. The latter is talking about another database (Tanzu Greenplum) but looks like it might be relevant.

Note 2: @FrankYellin's confirms my initial impressions on how to access/use the GD. So this is my view at this point: in the postgres environment there is a plpython environment and each function in the latter has its own execution environment. The SD is contained in the function execution environment, while the GD exists in the postgres (?) environment where the variables are accessible by all functions. It doesn't look like I have to explicitly declare a variable as global if I insert it in the GD. When a session ends all dictionaries disappear. Looks like this: enter image description here


Solution

  • My diagram is correct as verified in the postgres community slack. All the procedural language extensions have something similar basically a feature to hold global and function-specific variables.

    Additionally, plpython3u function calls are calls to PyEval_EvalCode in Python. The downside to that is that there is no advantage taken of Postgres' ability to use one or more parallel worker processes.

    Finally, the function's return of data eg. 'class' object is limited as detailed here but I might be able to either convert class information to a string or use iterators/generators. Still unclear at this point.