Search code examples
pythonetl

Execute multiple inner functions


I'm currently working on an ETL process where I have all the separate functions working, however, I'm still having troubles to get it work in a main function.

def etl(url):

    def extract(url):
        return url_json_file
    extract()

    def transform(url_json_file):
        return json_transformed
    transform()

    def load(json_transformed):
        load_json_to_db
    load()
    
pass

This is the ETL schema. My objective is, that I pass the outter function etl() a URL string, and from that point the process goes like this:

  • extract(): Takes in that URL string, scrape it and performs a json.loads() and returns it.
  • transform(): Takes in the loaded json file, transforms it and returns a clean new dictionary with the data on that previous JSON.
  • load(): Takes in the clean JSON dict. and loads it to X database

PROBLEM: I'm probably doing a wrong return action, even if I set those return values to global, the process DOES NOT END. It usually returns this error: [WinError 6] The handle is invalid

My question, therefore, is, how can I make an inner function take the returned value or operation from a previous function. I would appreciate any hint or tip.


Solution

  • def etl(url):
        def extract(url):
            url_json_file = url   ## Do smt.
            print("first step")
            return url_json_file
    
        def transform(url_json_file):
            json_transformed = url_json_file ## Do smt.
            print("second step")
            return json_transformed
    
        def load(json_transformed):
            ###do some db insert operations
            print("third step")
            db_send = json_transformed
            return db_send
        return "ETL: " + load(transform(extract(url)))
    
    print(etl(url="https://www.google.com"))
    

    The result should be :

    first step
    second step
    third step
    ETL: https://www.google.com