Search code examples
pythonamazon-s3csvluigi

Moving a tsv file from local file system to S3 in luigi


The following program does not output anything, nor does it throw any errors. Am I missing something in form of the run() method in the to_S3() class?

class to_S3(luigi.Task):

    #The class Mysql_to_tsv converts the data returned by a query on a Mysqldb and stores the data in a tsv in a local file.

    def requires(self):
        return [Mysql_to_tsv]

    def output(self):
        return luigi.S3Target("https://s3.amazonaws.com/bucket-name/luigi_attempt.tsv")

The output() method of the Mysql_to_tsv() class is:

def output(self):
        return luigi.LocalTarget('/Users/user/Desktop/Work/Luigi/test_data.tsv')

Please help with the correct class implementation of the task.


Solution

  • What I originally wanted put some data into an S3 bucket.

    So, one does not need an output() method to have run a particular task (Ex: dumping of data to an S3 bucket.)

    It can be done directly in the run() method, and the output() can be used to check for a flag or existence.

    So, the correct implementation would be:

    class to_S3(luigi.Task):
    
        def requires(self):
            return [Mysql_to_csv()]
    
    
        def run(self):
    
            #Creating a connection
            access_key = ""
            access_secret = ""
            conn = S3Connection(access_key, access_secret)
    
            #Connecting to the bucket
            bucket_name = ""
            bucket = conn.get_bucket(bucket_name)
    
            #Setting up the keys
            k = Key(bucket)
            k.key = "sample1"
            k.set_contents_from_filename("../test_data.tsv")