Search code examples
pythonpostgresqlinsertforeign-keysexecutemany

INSERT FOREIGN KEY in another table with executemany() in PostgreSQL


I tried to insert row values for code column in statements table as a foreign key from companies Table. i took the following steps:

Creating Tables

cur.execute("CREATE TABLE IF NOT EXISTS companies (code INT NOT NULL PRIMARY KEY, short_name VARCHAR(255) NOT NULL, long_name VARCHAR(255) NOT NULL)")

cur.execute("CREATE TABLE IF NOT EXISTS statements (statement_id SERIAL NOT NULL PRIMARY KEY, statement_name VARCHAR(255) NOT NULL, code INT NOT NULL, FOREIGN KEY (code) REFERENCES companies_list (code))")

What code column contains in companies table ( i.e. )

 code |
-----------
  113
  221
  344

The next step is inserting wanted data to statements table as below :

statement_name = ["balance_sheet", "income_statement", "cash_flow"]

code = "SELECT code FROM companies_list WHERE code IS NOT NULL"

statements = [tuple((t,)) for t in zip(statement_name, code)]

query = "INSERT INTO statements (statement_name, code) VALUES %s"
cur.executemany(query, statements)

i got the following error :

psycopg2.DataError: invalid input syntax for integer: "S"
LINE 1: ...ents (statement_name, code) VALUES ('balance_sheet', 'S')

The Final result i want to get is like below :

statement_id |   statement_name    |   code
---------------------------------------------
     1           balance_sheet         113
     2           income_statement      113
     3           cash_flow             113
     4           balance_sheet         221
     5          income_statement       221
     6           cash_flow             221

Solution

  • The error arises from this line:

    code = "SELECT code FROM companies_list WHERE code IS NOT NULL"
    

    This does not perform an actual query, it assigns the SQL select statement string to the code variable. The next line then zips the statement names with code which, because code is a string (an iterable), results in the first 3 characters of code being zipped with the items from statement_name, the result being:

    [(('balance_sheet', 'S'),), (('income_statement', 'E'),), (('cash_flow', 'L'),)]
    

    So that's where the 'S' is coming from - it's the first character of "SELECT" in the code string. 'S' is a string, not an integer as defined in the schema for the statements table, hence the error.

    You can see the queries generated with cursor.mogrify():

    >>> statement_name = ["balance_sheet", "income_statement", "cash_flow"]
    >>> code = "SELECT code FROM companies_list WHERE code IS NOT NULL"
    >>> statements = [tuple((t,)) for t in zip(statement_name, code)]
    >>> query = "INSERT INTO statements (statement_name, code) VALUES %s"
    >>> for args in statements:
    ...     print(cur.mogrify(query, args))
    ... 
    INSERT INTO statements (statement_name, code) VALUES ('balance_sheet', 'S')
    INSERT INTO statements (statement_name, code) VALUES ('income_statement', 'E')
    INSERT INTO statements (statement_name, code) VALUES ('cash_flow', 'L')
    

    One way of fixing this is to execute the query contained in code to get a list of company codes, then use that to construct the INSERT query:

    import itertools
    
    cur.execute("SELECT code FROM companies_list WHERE code IS NOT NULL")
    codes = [row[0] for row in cur.fetchall()]
    query = 'INSERT INTO statements (statement_name, code) VALUES (%s, %s)'
    args = itertools.product(statement_name, codes)
    cur.executemany(query, args)
    

    Here itertools.product() is used to form the Cartesian product of the statement names and the company codes. This is mimicking database join functionality, so if the statement types are available in your database, it might be better to do it in SQL rather than Python.