Search code examples
pythonjsoncsvtype-conversionjsonlines

Python removing .jsonl extension when converting JsonL to CSV File


I have with me a script that converts jsonl files in a selected directory to csv files in another specified location. However, upon converting the files to csv format, the final created csv file contains a .jsonl extension before the .csv (Think file.jsonl.csv) Any ideas on how to remove the .jsonl extension before adding the csv extension at the back? I hope I can be able to get rid of the .jsonl extension for the csv file as it may be confusing in future. Thank you!

Sample CSV file created: 20210531_CCXT_FTX_DOGEPERP.jsonl.csv

My script:

import glob
import json
import csv
import time


start = time.time()
#import pandas as pd
from flatten_json import flatten

#Path of jsonl file
File_path = (r'C:\Users\Natthanon\Documents\Coding 101\Python\JSONL')
#reading all jsonl files
files = [f for f in glob.glob( File_path + "**/*.jsonl", recursive=True)]
i = 0

for f in files:
    with open(f, 'r') as F:
        #creating csv files  
        with open(r'C:\Users\Natthanon\Documents\Coding 101\Python\CSV\\' + f.split("\\")[-1] + ".csv", 'w' , newline='') as csv_file:
            thewriter = csv.writer(csv_file)
            thewriter.writerow(["symbol", "timestamp", "datetime","high","low","bid","bidVolume","ask","askVolume","vwap","open","close","last","previousClose","change","percentage","average","baseVolume","quoteVolume"])

            for line in F:
                #flatten json files 
                data = json.loads(line)
                data_1 = flatten(data)
                #headers should be the Key values from json files that make Column header                    
                thewriter.writerow([data_1['symbol'],data_1['timestamp'],data_1['datetime'],data_1['high'],data_1['low'],data_1['bid'],data_1['bidVolume'],data_1['ask'],data_1['askVolume'],data_1['vwap'],data_1['open'],data_1['close'],data_1['last'],data_1['previousClose'],data_1['change'],data_1['percentage'],data_1['average'],data_1['baseVolume'],data_1['quoteVolume']])

Solution

  • The problem is because you are not getting rid of the extension when writing to the new file, something like this to replace your creation of the csv file should fix it

        file_name = f.rsplit("\\", 1)[-1].replace('.jsonl', '')
        with open(r'C:\Users\Natthanon\Documents\Coding 101\Python\CSV\\' + file_name + ".csv", 'w' , newline='') as csv_file: