So I have a "simple" process that needs to go out and grab data from another server and then copy the directory (and all sub-directories) to my server
The code is as follows:
import pysftp
dbfs_path = '/dbfs/mnt/aaa/bbb/output/{}/'.format(dbutils.widgets.get("run_name"))
remote_path = '/mst_bbb/{}/output/{}/'.format(bucket,dbutils.widgets.get("run_name"))
cnopts = pysftp.CnOpts()
cnopts.hostkeys = None
srv = pysftp.Connection(host=host_name, username="xxx",password="yyy",cnopts=cnopts)
srv.get_r(remote_path,dbfs_path)
It was working fine until I realized that sometimes I had to get the same directories more than once and would throw off an error that
the directory already exists
No problem, I thought and did the following:
import shutil
shutil.rmtree(dbfs_path)
And then re-ran the code
But now I get a much different error
---------------------------------------------------------------------------
IOError Traceback (most recent call last)
<ipython-input-16-9f782d79e03f> in <module>()
12
13 srv = pysftp.Connection(host=host_name, username="xxx",password="yyy",cnopts=cnopts)
---> 14 srv.get_r(remote_path,dbfs_path)
/databricks/python/local/lib/python2.7/site-packages/pysftp/__init__.pyc in get_r(self, remotedir, localdir, preserve_mtime)
309 self.get(fname,
310 reparent(localdir, fname),
--> 311 preserve_mtime=preserve_mtime)
312
313 def getfo(self, remotepath, flo, callback=None):
/databricks/python/local/lib/python2.7/site-packages/pysftp/__init__.pyc in get(self, remotepath, localpath, callback, preserve_mtime)
247 sftpattrs = self._sftp.stat(remotepath)
248
--> 249 self._sftp.get(remotepath, localpath, callback=callback)
250 if preserve_mtime:
251 os.utime(localpath, (sftpattrs.st_atime, sftpattrs.st_mtime))
/databricks/python/local/lib/python2.7/site-packages/paramiko/sftp_client.pyc in get(self, remotepath, localpath, callback)
767 Added the ``callback`` param
768 """
--> 769 with open(localpath, 'wb') as fl:
770 size = self.getfo(remotepath, fl, callback)
771 s = os.stat(localpath)
IOError: [Errno 2] No such file or directory: u'/dbfs/aaa/bbb/output/run_job/./mst_bbb/pri1/output/run_job/date=2017-12-01/2017-12-01_output_0.csv.gz'
Any ideas what might be causing this problem? I can't figure it out
Thanks
I believe that the target directory of get_r
(localdir
argument) has to exists. pysftp won't create it for you.
While your call to shutil.rmtree
removes not only directory contents, but the directory itself too.
Recreate the directory afterwards:
shutil.rmtree(dbfs_path)
os.mkdir(dbfs_path)
Though actually, I do not understand your original problem. I do not see why would you be getting the "the directory already exists" error. Maybe you should ask about that problem, rather than implementing an inefficient workaround.