Search code examples
pythonpython-3.xpython-collections

Convert 2-column counter-like csv file to Python collections.Counter?


I have a comma separated (,) tab delimited (\t), file.

68,"phrase"\t
485,"another phrase"\t
43, "phrase 3"\t

Is there a simple approach to throw it into a Python Counter?


Solution

  • I couldn't let this go and stumbled on what I think is the winner.

    In testing it was clear that looping through the rows of the csv.DictReader was the slowest part; taking about 30 of the 40 seconds.

    I switched it to simple csv.reader to see what I would get. This resulted in rows of lists. I wrapped this in a dict to see if it directly converted. It did!

    Then I could loop through a native dictionary instead of a csv.DictReader.

    The result... done with 4 million rows in 3 seconds! 🎉

    def convert_counter_like_csv_to_counter(file_to_convert):
        with file_to_convert.open(encoding="utf-8") as f:
            csv_reader = csv.reader(f, delimiter="\t")
            d = dict(csv_reader)
            the_counter = Counter({phrase: int(float(count)) for count, phrase in d.items()})
    
        return the_counter