I'm struggling with converting of local json files into parquet files. Each file should be converted with pandas to a parquet file and save it, so i have the same amount of files, just as parquets.
I looped through my directory and became a list of all my json files existing and put them into a pandas dataframe.
path = 'trackingdata/'
df = list()
for root, dirs, files in os.walk(path, topdown=False):
for name in files:
df.append(os.path.join(root, name))
df = pd.DataFrame(df)
Is it better to loop trough the dataframe now and transform each file with
or would it be better to write the transformation into the code above after looping through the dir? And how can i transform each of the files to parquet without joining all together?
How about defining a json_to_parquet converter:
def json_to_parquet(filepath):
df = pd.read_json(filepath, typ='series').to_frame("name")
parquet_file = filepath.split(".")[0] + ".parquet"
Depending on how your json is formatted you may need to change the read_json line and/or use the tips here
Then just processing each file one at at time:
path = 'trackingdata/'
for root, dirs, files in os.walk(path, topdown=False):
for name in files:
json_to_parquet(os.path.join(root, name))