Search code examples
pythonms-accesssqlalchemypyodbcsqlalchemy-access

Read a Zip file from URL and convert a Ms Access file to data frame


I' writing this message to ask help in order to find a solution for this python code. I'm trying to generate a visualization of FAA PMA Parts in comparison with my company's parts database. I've already done it using Power Bi, SQL Query and Excel files, however it is not an online visualization. Every month i have to download FAA PMA Parts from URL (https://rgl.faa.gov/Regulatory_and_Guidance_Library/rgPMA.nsf/f6a80b2e3c91131686257bcf0063a042/bc1b3424cc899bee86257beb006237ae/$FILE/PMA.zip) in order to update this visualization.

That being said, i've tried to do these python codes in order to read Ms Access file from the above mentioned Url and convert it to a Pandas DataFrame.

However i didn't succeed in this task for the following reasons:

First i've tried to use sqlalchemy, but the following errors have shown up.

    from zipfile import ZipFile
    from io import BytesIO
    import urllib
    from urllib.request import urlopen
    from sqlalchemy import create_engine
    import pandas as pd
    
    r = urlopen("https://rgl.faa.gov/Regulatory_and_Guidance_Library/rgPMA.nsf/f6a80b2e3c91131686257bcf0063a042/bc1b3424cc899bee86257beb006237ae/$FILE/PMA.zip").read()
    file = ZipFile(BytesIO(r))
    pma_accdb = file.open("PMA.accdb")
    
    connection_string = (r'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};'
                         r'DBQ=pma_accdb;'
                         r'ExtendedAnsiSQL=1;'
                                              )
    connection_url = f"access+pyodbc:///?odbc_connect={urllib.parse.quote_plus(connection_string)}"
    engine = create_engine(connection_url)
    sql_df = pd.read_sql("SELECT * FROM Parts", con=engine)
    
    print(sql_df.head())

Error:Traceback (most recent call last): File "C:/Users/thiago.ribeiro/.PyCharmCE2019.1/config/scratches/scratch.py", line 17, in engine = create_engine(connection_url) File "C:\Users\thiago.ribeiro\AppData\Local\Programs\Python\Python38\lib\site-packages\sqlalchemy\engine_init_.py", line 500, in create_engine return strategy.create(*args, **kwargs) File "C:\Users\thiago.ribeiro\AppData\Local\Programs\Python\Python38\lib\site-packages\sqlalchemy\engine\strategies.py", line 61, in create entrypoint = u._get_entrypoint() File "C:\Users\thiago.ribeiro\AppData\Local\Programs\Python\Python38\lib\site-packages\sqlalchemy\engine\url.py", line 172, in _get_entrypoint cls = registry.load(name) File "C:\Users\thiago.ribeiro\AppData\Local\Programs\Python\Python38\lib\site-packages\sqlalchemy\util\langhelpers.py", line 267, in load raise exc.NoSuchModuleError( sqlalchemy.exc.NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:access.pyodbc

After that i've tried to use pyodbc, but the following errors have also shown up.

    from zipfile import ZipFile
    from io import BytesIO
    from urllib.request import urlopen
    import pyodbc
    import pandas as pd
    
    r = urlopen("https://rgl.faa.gov/Regulatory_and_Guidance_Library/rgPMA.nsf/f6a80b2e3c91131686257bcf0063a042/bc1b3424cc899bee86257beb006237ae/$FILE/PMA.zip").read()
    file = ZipFile(BytesIO(r))
    pma_accdb = file.open("PMA.accdb")
    
    driver = '{Microsoft Access Driver (*.mdb, *.accdb)}'
    filepath = pma_accdb
    
    myDataSources = pyodbc.dataSources()
    access_driver = myDataSources['MS Access Database']
    
    cnxn = pyodbc.connect(driver=access_driver, dbq=filepath, autocommit=True)
    crsr = cnxn.cursor()
    crsr.execute("SELECT * FROM Parts")
    
    sql_df = pd.read_sql("SELECT * FROM Parts", con=engine)
    
    print(sql_df.head())

Error: Traceback (most recent call last): line 17, in cnxn = pyodbc.connect(driver=access_driver, dbq=filepath, autocommit=True) pyodbc.Error: ('HY000', '[HY000] [Microsoft][Driver ODBC Microsoft Access] general error unable to open registry key Temporary (volatile) Ace DSN for process 0x2efc Thread 0x4a20 DBC 0xd39ea788

The above being said, could you please help me understand where my code is wrong and how can i improve it in order to read MS Access from url and convert it to a pandas data frame?


Solution

  • I've got a help from a friend. this was de solution:

    
            from zipfile import ZipFile
            from io import BytesIO
            import urllib
            from urllib.request import urlopen
            from sqlalchemy import create_engine
            import pandas as pd
            
            resp = urlopen("https://rgl.faa.gov/Regulatory_and_Guidance_Library/rgPMA.nsf/f6a80b2e3c91131686257bcf0063a042/bc1b3424cc899bee86257beb006237ae/$FILE/PMA.zip").read()
            zipfile = ZipFile(BytesIO(resp))
            
            connection_string = (r'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};'
            r'DBQ=' + zipfile.extract('PMA.accdb') + ';'
            r'ExtendedAnsiSQL=1;'
            )
            connection_url = f"access+pyodbc:///?odbc_connect={urllib.parse.quote_plus(connection_string)}"
            engine = create_engine(connection_url)
            sql_df = pd.read_sql("SELECT * FROM Parts", con=engine)