Search code examples
cpostgresqlgraph-theoryapache-age

How to do a 'for' loop in tables for a PostgreSQL function in C


I was thinking of developing a function for AGE that returns a graph's adjacency matrix. An adjacency matrix stores the number of edges between two nodes. The complexity to check if an edge exists within an adjacency matrix is O(1), while the space complexity is O(v^2) with v = number of vertices. AGE stores the vertices of each graph in graph_name._ag_label_vertex, while the edges are stored in graph_name._ag_label_edge.

The vertex table contains two columns: id and properties. The id column is of integer type, while the properties column is a JSON like format. The edge table contains the id of the edge, the id of the vertex which it comes from, the id of the vertex that it goes to, and the edge's properties.

Example:

SELECT * FROM "graph_name"."Vertex";
       id         | properties
------------------+-------------
 844424930131971  | {}
 1407374883553281 | {}


SELECT * FROM "graph_name"."Edges";
        id        |    start_id     |      end_id      | properties
------------------+-----------------+------------------+------------
 1125899906842625 | 844424930131971 | 1407374883553281 | {}
(1 row)

I was thinking on traversing the vertex column first, so that it is possible to store the vertices ids initially:

[0, 1, 2, 3]
[1, 0, 0, 0]
[2, 0, 0, 0]
[3, 0, 0, 0]

After that, loop through the edges table and store 1 if the edge between two nodes exists (if there are multiple edges between a pair of nodes, add 1 to the already set value for each edge). But I don't know exactly how to do this in C for a PostgreSQL function. Which function should I call to traverse the tables and get the values from the id column?

I thought of using SPI twice, first for the vertices and the second time for the edges, but I don't know if this is the best approach.

In the age--1.3.0.sql file, I declared the function as

CREATE FUNCTION ag_catalog.get_adjacency_matrix(graph_name name)
RETURNS integer[][]
LANGUAGE c
AS 'MODULE_PATHNAME';

And its definition in src/backend/commands/graph_commands.c:

PG_FUNCTION_INFO_V1(get_adjacency_matrix);
/*
* Function for returning the graph's adjacency matrix. This adjacency matrix
* stores the number of edges between two nodes. The complexity to check if an
* edge exists within an adjacency matrix is O(1), while the space complexity
* is O(v^2) with v = number of vertices.
*/
Datum get_adjacency_matrix(PG_FUNCTION_ARGS)
{
    PG_RETURN_VOID();
}

Solution

  • Apparently, the best way to traverse a table utilizing a C function for PostgreSQL is with the Server Programming Interface (SPI). Here's a code example:

    SPI_connect();
    
    char *query = psprintf("SELECT id FROM \"GraphName\".\"_ag_label_vertex\" ORDER BY id");
    
    // Execute the query and check for errors.
    int ret = SPI_execute(query, true, 0);
    if (ret != SPI_OK_SELECT)
    {
        elog(ERROR, "Error when fetching the vertices");
        SPI_finish();
        PG_RETURN_NULL();
    }
    
    // And then here is how to traverse the resulting table.
    for (i = 0; i < SPI_processed; i++)
    {
        char *result_string = SPI_getvalue(SPI_tuptable->vals[i], SPI_tuptable->tupdesc, 1);
        int vertex_id = DatumGetInt64(DirectFunctionCall1(int8in, CStringGetDatum(result_string)));
    }
    

    From PostgreSQL's Documentation:

    When we call SPI_execute(const char * command, bool read_only, long count), it will execute the specified SQL command for count rows, and, if read_only is true, the command must be read only. If count is zero then the command is executed for all rows. This function returns a value depending on what query was executed. In this case, it returns SPI_OK_SELECT if the execution was successful.

    The actual number of rows for which the (last) command was executed is returned in the global variable SPI_processed. If the return value of the function is indeed SPI_OK_SELECT, then we can use the global pointer SPITupleTable *SPI_tuptable to access the result rows.

    The SPITupleTable struct is declared as:

    typedef struct SPITupleTable
    {
        /* Public members */
        TupleDesc   tupdesc;        /* tuple descriptor */
        HeapTuple  *vals;           /* array of tuples */
        uint64      numvals;        /* number of valid tuples */
    
        /* Private members, not intended for external callers */
        uint64      alloced;        /* allocated length of vals array */
        MemoryContext tuptabcxt;    /* memory context of result table */
        slist_node  next;           /* link for internal bookkeeping */
        SubTransactionId subid;     /* subxact in which tuptable was created */
    } SPITupleTable;
    

    We can access the corresponding row from the SPITupleTable by doing SPI_tuptable->vals[i] where i is the number of the row. To get the value of a certain column from the row, in this case, we need to call SPI_getvalue(HeapTuple row, TupleDesc rowdesc, int colnumber). This function will return a char* to the corresponding data.

    If we want to convert it to int64, we call DatumGetInt64(DirectFunctionCall1(int8in, CStringGetDatum(result_string))).