I'm currently working on an ETL process where I have all the separate functions working, however, I'm still having troubles to get it work in a main function.
def etl(url):
def extract(url):
return url_json_file
extract()
def transform(url_json_file):
return json_transformed
transform()
def load(json_transformed):
load_json_to_db
load()
pass
This is the ETL schema. My objective is, that I pass the outter function etl()
a URL string, and from that point the process goes like this:
extract()
: Takes in that URL string, scrape it and performs a json.loads()
and returns it.transform()
: Takes in the loaded json file, transforms it and returns a clean new dictionary with the data on that previous JSON.load()
: Takes in the clean JSON dict. and loads it to X databasePROBLEM: I'm probably doing a wrong return action, even if I set those return values to global
, the process DOES NOT END. It usually returns this error: [WinError 6] The handle is invalid
My question, therefore, is, how can I make an inner function take the returned value or operation from a previous function. I would appreciate any hint or tip.
def etl(url):
def extract(url):
url_json_file = url ## Do smt.
print("first step")
return url_json_file
def transform(url_json_file):
json_transformed = url_json_file ## Do smt.
print("second step")
return json_transformed
def load(json_transformed):
###do some db insert operations
print("third step")
db_send = json_transformed
return db_send
return "ETL: " + load(transform(extract(url)))
print(etl(url="https://www.google.com"))
The result should be :
first step
second step
third step
ETL: https://www.google.com