I have a very large CSV file with 35GB in size containing 1.09B rows and three columns where two of the columns are string and wrapped with quote, and one of them is double with no quotes around. Due to the large size it is not possible to open on notepad in order to edit. Is there a way to pass this file through command line whether python, or some other method to either add quote around third column, or remove quotes from the first two columns?
E.g.
"zip1","zip2",miles "00601","10394",2593.34
I would like to either remove quotes from the first two records, or add quotes on the third record. Once imported through fastload I will later add float column and perform update from the third column that will be forced as character during load.
Try convtools based solution to add quotes to the third column. The following should process the file as a stream:
import csv
from convtools import conversion as c
from convtools.contrib.tables import Table
Table.from_csv("input.csv", header=True).update(
miles=c.col("miles").as_type(str)
).into_csv(
"output.csv", dialect=Table.csv_dialect(quoting=csv.QUOTE_NONNUMERIC)
)