Search code examples
pythonsftpshapefilepysftp

Read SHP file from SFTP using pysftp


I am trying to use pysftp's getfo() to read a shapefile (without downloading it). However the output I get does not seem workable and I'm not sure if its possible do this with a shapefile.

Ideally I would like to read in the file and convert it to a Geopandas GeoDataFrame.

import pysftp
import io

with pysftp.Connection(host=host, username=user, password=pass) as sftp:
    print("Connection established ... ")

    flo = io.BytesIO()
    sites = sftp.getfo('sites/Sites.shp', flo)
    value=flo.getvalue()

From here I can't decode the value and am unsure of how to proceed.


Solution

  • Something like this should do:

    flo.seek(0)
    df = geopandas.read_file(shp=flo)
    

    Though using the Connection.getfo unnecessarily keeps whole raw file in memory. More efficient would be:

    with sftp.open('sites/Sites.shp', bufsize=32768) as f:
        df = geopandas.read_file(f)
    

    (for the purpose of bufsize=32768, see Reading file opened with Python Paramiko SFTPClient.open method is slow)


    Though if I understand it correctly, you need multiple files. There's no way the geopandas can magically access other related files on a remote server, when you provide the "shp" via file-like object. Geopandas does not know, where does the "shp" come from or even what is its physical name. You need to provide file-like objects for all individual files. See Using pyshp to read a file-like object from a zipped archive – they do not use Geopandas, but the principle is the same.

    For Geopandas, it seems that underlying fiona library handles that and I didn't find any documentation of the relevant parameters.

    I guess something like this might do, but that's just a wild guess:

    with sftp.open('sites/Sites.shp', bufsize=32768) as shp,
         sftp.open('sites/Sites.shx', bufsize=32768) as shx:
         sftp.open('sites/Sites.dbf', bufsize=32768) as dbf:
         ...
        df = geopandas.read_file(shp, shx=shx, dbf=dbf, ...)
    

    or switch to the shapefile/pyshp module:

    with sftp.open('sites/Sites.shp', bufsize=32768) as shp,
         sftp.open('sites/Sites.shx', bufsize=32768) as shx:
         sftp.open('sites/Sites.dbf', bufsize=32768) as dbf:
         ...
        r = shapefile.Reader(shp=shp, shx=shx, dbf=dbf)
    

    Another trick is to pack all files to a zip archive:
    Read shapefile from HDFS with geopandas


    Btw, note that the code downloads the file(s) anyway. You cannot parse a remote file contents, without actually downloading that file contents. The code just avoids storing the downloaded file contents to a (temporary) local file.