I have web source stored in my SQL Server table as varchar
. The table has 1000 of these web sources. Manually opening each XML source saving it as XML data and store in SQL Server table as XML data type is tedious process. Please help with any script in SQL or Python that can iterate through the list in my table and save each link as XML data type in a SQL Server table. I am preparing those files for multiple XML shredding. Thanks for your help.
select * from [dbo].[FormIndex]
I have attached a snippet of my table.
The end goal is the table below. Here is how one of them looks like when stored in as XML manually. Doing thousand of them is very difficult manually so I decided to use python code that opens the string stored in SQL Server, open it, save the url as XML
data type and store it back on a table.
Please help - thanks
Here is the Python code I used to pull the table as a pandas data frame and iterate through each varchar
url, open it and save it back to the SQL Server table as XML
data. But I can't run it
First one works fine and returned the right result , returned a table of XMLLink (a column name I used to store the XML strings in SQL Server):
##From SQL Server database to DataFrame Pandas
import pandas as pd
import pyodbc
cnxn = pyodbc.connect
cnxn = pyodbc.connect(
r'Trusted_Connection=yes;'
r'DRIVER={ODBC Driver 13 for SQL Server};'
r'SERVER=*****;'
r'DATABASE=IRS-900;'
r'UID=*****;'
r'PWD=*****;'
)
query = "SELECT XMLLink FROM [IRS-900].[dbo].[FormIndex]"
df = pd.read_sql(query, cnxn)
print(df.head(7))
but when I build the for loop to iterate through the data frame, it throws an error. Here is the code
import os
import pyodbc
import datetime
import pyodbc
import urllib
import pandas as pd
##From SQL Server database to DataFrame Pandas
cnxn = pyodbc.connect
cnxn = pyodbc.connect(
r'Trusted_Connection=yes;'
r'DRIVER={ODBC Driver 13 for SQL Server};'
r'SERVER=*****;'
r'DATABASE=IRS-900;'
r'UID=*****;'
r'PWD=*****;'
)
query = "SELECT XMLLink FROM [IRS-900].[dbo].[FormIndex]"
df = pd.read_sql(query, cnxn)
##print(df.head(7))
for xmlpath in df:
xmlurl=urllib.request.urlopen(xmlpath)
xml_as_string = xmlurl.read()
cursor = cnxn.cursor()
cursor.execute("insert into Test_Table values ('Text', 1, '"+ xml_as_string +"')")
cnxn.commit()
dData = cursor.fetchall()
You can try the following T-SQL. It iterates through a table calling a web URL, and eventually saves XML response into a DB table for future consumption.
SQL
SET TEXTSIZE -1;
-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, url NVARCHAR(1000));
INSERT INTO @tbl (url) VALUES
('https://s3.amazonaws.com/irs-form-990/200931393493000150_public.xml'),
('https://s3.amazonaws.com/irs-form-990/201013693492007030_public.xml');
DECLARE @tblTarget TABLE (ID INT IDENTITY, xmldata XML, LoadedDateTime DATETIMEOFFSET(3) DEFAULT (SYSDATETIMEOFFSET()));
DECLARE @responseText TABLE (responseText VARCHAR(MAX));
-- DDL and sample data population, end
DECLARE @status INT
, @res AS INT
, @url AS NVARCHAR(1000);
DECLARE @RowCount INT = (SELECT COUNT(*) FROM @tbl);
WHILE @RowCount > 0 BEGIN
SELECT @url = url
FROM @tbl
ORDER BY ID DESC OFFSET @RowCount - 1 ROWS FETCH NEXT 1 ROWS ONLY;
-- do whatever needed, apply any logic, call stored procedures, etc.
EXEC sp_OACreate 'MSXML2.ServerXMLHTTP', @res OUT;
EXEC sp_OAMethod @res, 'open', NULL, 'GET',@url,'false';
EXEC sp_OAMethod @res, 'send';
EXEC sp_OAGetProperty @res, 'status', @status OUT;
INSERT INTO @ResponseText (ResponseText)
EXEC sp_OAGetProperty @res, 'responseText';
EXEC sp_OADestroy @res;
INSERT INTO @tblTarget (xmldata)
SELECT responseText FROM @responseText;
DELETE FROM @responseText;
SET @RowCount -= 1;
END
SELECT * FROM @tblTarget;