I need to write a lot of data from pandas dataframes to MS-SQL tables (1000's or even more rows at once).
I'm using pyodbc executmany
with fast_executemany=True
, otherwise it takes hours for each table. Now the IT guys want me to start working with stored procedures instead of direct access to the DB for security reasons.
The problem is that from what I've seen, SQL is, how to say, not as flexible as python, and I have to declare every variable passed to the SP, so I don't see how I can pass the SP with df.values.tolist()
as I'm doing today with the executemany
function on direct access to the DB.
For example, let's say I have a dataframe df:
1 Josh 10000
2 Michael 5000
3 Sara 8000
Today I would use:
cursor.fast_executemany = True
insert_str = "INSERT INTO [%s].[%s] VALUES (?, ?, ?)"
cursor.executemany(insert_str % (scheme, table), df.values.tolist()])
And all the dataframe would be inserted to the table at once (and fast). But, is doesn't seem possible to do the same with calling an SP.
Is there a way to use pyodbc executemany
with stored procedures?
Yes, we can use executemany
with Stored Procedures in SQL Server:
Table:
CREATE TABLE [dbo].[Table_1](
[id] [int] NOT NULL,
[name] [nvarchar](50) NULL,
[salary] [int] NULL,
CONSTRAINT [PK_Table_1] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Stored Procedure:
CREATE PROCEDURE [dbo].[Table_1_insert]
@id int = 0,
@name nvarchar(50),
@salary int
AS
BEGIN
SET NOCOUNT ON;
INSERT INTO Table_1 (id, name, salary) VALUES (@id, @name, @salary);
END
Python code:
df = pd.DataFrame(
[(1, 'Josh', 10000), (2, 'Michael', 5000), (3, 'Sara', 8000)],
columns=['id', 'name', 'salary'])
crsr.executemany("{CALL dbo.Table_1_insert(?, ?, ?)}", df.values.tolist())
cnxn.commit()