python python-3.x pandas csv export-to-csv

How to solve the problem of each CSV element calculation error?

I have a CSV file that must count and output the results.

The CSV file has millions of rows. The following is my CSV file screenshot.

The following is my code:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
pd.set_option("display.max_rows",1000000000)
pd.set_option("display.max_columns",1000000000)
df = pd.read_csv("Ax_Seg_output_no_comma.csv")
cnted = df.groupby(['Content'],as_index=False)['Content'].agg({'cnt':'count'})
cnted.to_csv('01.csv',index=0)

I used pandas to count it, but I got some problems.

It has not to count properly.

I need to get the result such as A,5 B,2 C,1......

However, I got some wrong results is A,5 B C,1

It has not counted some elements.

A part of the lines has not to count.
If I count only 25000 rows of the element, it can output the correct result.

The following is the wrong result:

And then, the normal result should be the following:

I doubt if it exceeds the pandas limit. I think it has no more errors.

Can anyone help me? Thanks

(It is the original CSV file: https://drive.google.com/file/d/18_Y3Wu8OFFpAzgRXRsNh8C_nyh8wPPEu/view?usp=sharing)

Solution

Your code is fine, but the results are confusing as some of the items (the value of 'Content') is multi-line. That's why you're seeing things such as:

a

b:2

The reason that some items contain multi-lines / newline characters is that you have quote signs in your CSV. To ignore them, read the csv as follows:

import csv 
df = pd.read_csv("Ax_Seg_output_no_comma.csv", quoting=csv.QUOTE_NONE)