I tried to insert row values for code column in statements table as a foreign key from companies Table. i took the following steps:
Creating Tables
cur.execute("CREATE TABLE IF NOT EXISTS companies (code INT NOT NULL PRIMARY KEY, short_name VARCHAR(255) NOT NULL, long_name VARCHAR(255) NOT NULL)")
cur.execute("CREATE TABLE IF NOT EXISTS statements (statement_id SERIAL NOT NULL PRIMARY KEY, statement_name VARCHAR(255) NOT NULL, code INT NOT NULL, FOREIGN KEY (code) REFERENCES companies_list (code))")
What code column contains in companies table ( i.e. )
code |
-----------
113
221
344
The next step is inserting wanted data to statements table as below :
statement_name = ["balance_sheet", "income_statement", "cash_flow"]
code = "SELECT code FROM companies_list WHERE code IS NOT NULL"
statements = [tuple((t,)) for t in zip(statement_name, code)]
query = "INSERT INTO statements (statement_name, code) VALUES %s"
cur.executemany(query, statements)
i got the following error :
psycopg2.DataError: invalid input syntax for integer: "S"
LINE 1: ...ents (statement_name, code) VALUES ('balance_sheet', 'S')
The Final result i want to get is like below :
statement_id | statement_name | code
---------------------------------------------
1 balance_sheet 113
2 income_statement 113
3 cash_flow 113
4 balance_sheet 221
5 income_statement 221
6 cash_flow 221
The error arises from this line:
code = "SELECT code FROM companies_list WHERE code IS NOT NULL"
This does not perform an actual query, it assigns the SQL select statement string to the code
variable. The next line then zips the statement names with code
which, because code
is a string (an iterable), results in the first 3 characters of code
being zipped with the items from statement_name
, the result being:
[(('balance_sheet', 'S'),), (('income_statement', 'E'),), (('cash_flow', 'L'),)]
So that's where the 'S'
is coming from - it's the first character of "SELECT" in the code
string. 'S'
is a string, not an integer as defined in the schema for the statements
table, hence the error.
You can see the queries generated with cursor.mogrify()
:
>>> statement_name = ["balance_sheet", "income_statement", "cash_flow"]
>>> code = "SELECT code FROM companies_list WHERE code IS NOT NULL"
>>> statements = [tuple((t,)) for t in zip(statement_name, code)]
>>> query = "INSERT INTO statements (statement_name, code) VALUES %s"
>>> for args in statements:
... print(cur.mogrify(query, args))
...
INSERT INTO statements (statement_name, code) VALUES ('balance_sheet', 'S')
INSERT INTO statements (statement_name, code) VALUES ('income_statement', 'E')
INSERT INTO statements (statement_name, code) VALUES ('cash_flow', 'L')
One way of fixing this is to execute the query contained in code
to get a list of company codes, then use that to construct the INSERT
query:
import itertools
cur.execute("SELECT code FROM companies_list WHERE code IS NOT NULL")
codes = [row[0] for row in cur.fetchall()]
query = 'INSERT INTO statements (statement_name, code) VALUES (%s, %s)'
args = itertools.product(statement_name, codes)
cur.executemany(query, args)
Here itertools.product()
is used to form the Cartesian product of the statement names and the company codes. This is mimicking database join functionality, so if the statement types are available in your database, it might be better to do it in SQL rather than Python.