Search code examples
pythonsql-serverscrapypymssql

Scrapy wont connect to MSSQL database


Fixed

My spider is fully working and I can export the data to JSON, CSV and to a MongoDB. However, since I will be dealing with large chunks of data, I would like to use MSSQL. I've browsed through google and stackoverflow in order to find a solution, but despite many attempts, scrapy wont connect to the database. My brother, a SQL developer, helped me setting up a local database which I could use to store my data on. So I am pretty sure that the database (which is very basic) is set-up correctly.

I am currently hosting the SQL server locally on my desktop with my desktop its username. I haven't set a password and my database is called "kaercher". I'd like to export the data to the table called "products_tb". I have given myself full sysadmin access, so that should be more than enough.

Any of you have any experience with MSSQL?

Made it working using pymssql

pipelines.py

import pymssql

class KrcPipeline(object):

    def __init__(self):
        self.conn = pymssql.connect(host='DESKTOP-P1TF28R', user='sa', password='123', database='kaercher')
        self.cursor = self.conn.cursor()

    def process_item(self, item, spider):

        self.cursor.execute("INSERT INTO products_tb(productid, category, name, description, price, timestamp) VALUES (%s, %s, %s, %s, %s, %s)",
                            (item['productid'], item['category'], item['name'], item['description'], item['price'], item['timestamp']))
        self.conn.commit()

        return item

How I found my driver version

for driver in pyodbc.drivers():
    print(driver)

SQL Server
SQL Server Native Client 11.0
SQL Server Native Client RDA 11.0
ODBC Driver 13 for SQL Server
ODBC Driver 17 for SQL Server
MySQL ODBC 8.0 ANSI Driver
MySQL ODBC 8.0 Unicode Driver

items.py

import scrapy


class KrcItem(scrapy.Item):
    productid=scrapy.Field()
    name=scrapy.Field()
    description=scrapy.Field()
    price=scrapy.Field()
    producttype=scrapy.Field()
    timestamp=scrapy.Field()
    category=scrapy.Field()
    pass

settings.py

ITEM_PIPELINES = {'krc.pipelines.KrcPipeline': 300}

Solution

  • It appears to be connecting over TCP/IP, which may not be enabled. Using SQL Server Configuration Manager enable TCP/IP and ensure the instance is listening on port 1433.

    Then you'll need to switch to Windows Integrated Auth, not username/password, if you're connecting with your Windows user.

    Also pyodbc is the recommended library for connecting to SQL Server.

    The posted code looks fine. And this works for me:

    import pyodbc
    
    
    cnxn = pyodbc.connect(r'Driver={ODBC Driver 17 for SQL Server};Server=localhost;Database=tempdb;Trusted_Connection=yes;')
    
    cursor = cnxn.cursor()
    
    create_table_query = '''create table #products_tb (productid int, category varchar(20), name varchar(20), description varchar(20), price float, timestamp datetime)'''
    
    cursor.execute(create_table_query)
    
    insert_query = '''INSERT INTO #products_tb (productid, category, name, description, price, timestamp)
                        VALUES (?, ?, ?, ?, ?, ?);'''
    item = [1,2,3,4,5]
    
    for i in item:
    
        row = [i,"SomeCat","SomeName","A thing", 12.4, "20190101"]
        values = (row[0],row[1],row[2],row[3],row[4],row[5])
    
        cursor.execute(insert_query, values)
    
    cnxn.commit()  
    
    result = cursor.execute("select * from #products_tb")
    rows = cursor.fetchall()
    
    print(rows)