I have a comma separated (,
) tab delimited (\t
), file.
68,"phrase"\t
485,"another phrase"\t
43, "phrase 3"\t
Is there a simple approach to throw it into a Python Counter
?
I couldn't let this go and stumbled on what I think is the winner.
In testing it was clear that looping through the rows of the csv.DictReader
was the slowest part; taking about 30 of the 40 seconds.
I switched it to simple csv.reader
to see what I would get. This resulted in rows of lists. I wrapped this in a dict
to see if it directly converted. It did!
Then I could loop through a native dictionary instead of a csv.DictReader
.
The result... done with 4 million rows in 3 seconds! 🎉
def convert_counter_like_csv_to_counter(file_to_convert):
with file_to_convert.open(encoding="utf-8") as f:
csv_reader = csv.reader(f, delimiter="\t")
d = dict(csv_reader)
the_counter = Counter({phrase: int(float(count)) for count, phrase in d.items()})
return the_counter