Search code examples
pythonhadoopsnakebite

Python HDFS Snakebite : Methods work only with print


I am using the snakebite client from

https://github.com/spotify/snakebite

and i notice a strange behavior when i try to make a directory or move files around in hdfs. Here is my code. All it does it move the contents of the source directory to the destination directory. Then finally, displays the content of the destination directory

def purge_pending(self,source_dir,dest_dir):

        if(self.hdfs_serpent.test(path=self.root_dir+"/"+source_dir, exists=True, directory=True)):
            print "Source exists ",self.root_dir+source_dir
            for x in self.hdfs_serpent.ls([self.root_dir+source_dir]):
                print x['path']
        else:
            print "Source does not exist ",self.root_dir+"/"+source_dir
            return
        if(self.hdfs_serpent.test(path=self.root_dir+"/"+dest_dir, exists=True, directory=True)):
            print "Destination exists ",self.root_dir+dest_dir
        else:
            print "Destination does not exist ",self.root_dir+dest_dir
            print "Will be created"
            for y in self.hdfs_serpent.mkdir([self.root_dir+dest_dir],create_parent=True):
                print y

        for src in self.hdfs_serpent.ls([self.root_dir+source_dir]):
            print src['path'].split("/")[-1]
            for y in self.hdfs_serpent.rename([src['path']],self.root_dir+dest_dir+"/"+src['path'].split("/")[-1]):
                print y


        for x in self.hdfs_serpent.ls([self.root_dir+dest_dir]):
            print x['path']

and here is a sample output from when the destination did not exist

Source exists  /root/source
/root/source/208560.json
/root/source/208571.json
/root/source/208574.json
/root/source/208581.json
/root/source/208707.json
Destination does not exist /root/dest
Will be created
{'path':'/research/dest/'}
208560.json
{'path':'/research/dest/208560.json'}
208571.json
{'path':'/research/dest/208571.json'}
208574.json
{'path':'/research/dest/208574.json'}
208581.json
{'path':'/research/dest/208581.json'}
208707.json
{'path':'/research/dest/208707.json'}

and the weird part is that i have to put those print statements in, otherwise nothing works. So

self.hdfs_serpent.mkdir([self.root_dir+dest_dir],create_parent=True)

does not work, but

for y in self.hdfs_serpent.mkdir([self.root_dir+dest_dir],create_parent=True):
                print y

does!!! same for

self.hdfs_serpent.rename([src['path']],self.root_dir+dest_dir+"/"+src['path'].split("/")[-1])

as the above does not work but the following does

for y in self.hdfs_serpent.rename([src['path']],self.root_dir+dest_dir+"/"+src['path'].split("/")[-1]):
                print y

is this a bug? am i doing something wrong?


Solution

  • This looks to be by design, as the documentation states that most of the objects returned by the methods are generators. Therefore, the function won't usually do anything until the values have been consumed with next() which for does implicitly.