I am getting an error on one workstation when running a Python script. The script runs fine on VMs and my workstation.
pip list
Shows packages are the sameIt might be a memory issue, but the workstation has 2x4Gb RAM. I tried to chunk it out, but that did not work either. The file is barely 1Mb.
As troubleshooting, I cut the file to just 500 rows, and it ran fine. When I tried 1000 rows out of the 2500 rows in the file, it gave the same error. Interestingly the workstation cannot run the script with even just one row now.
Including error_bad_lines=False
, iterator=True
, chunksize=
, low_memory=False
have all not worked.
What is causing this error? Why did it run just fine using a few rows, but now not even with one row?
Here is the Traceback:
Traceback (most recent call last):
File "c:\Users\script.py", line 5, in <module>
data = pd.read_csv("C:/Path/file.csv", encoding='latin-1' )
File "C:\Users\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "C:\Users\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 680, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 581, in _read
return parser.read(nrows)
File "C:\Users\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 1250, in read
index, columns, col_dict = self._engine.read(nrows)
File "C:\Users\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 225, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas\_libs\parsers.pyx", line 805, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas\_libs\parsers.pyx", line 861, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 847, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 1960, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 5, saw 4
Here is the script:
# Import raw data
data = pd.read_csv("C:/Users/Script.csv", encoding='latin-1' )
# Create array to track failed cases.
data['Test Case Failed']= ''
data = data.replace(np.nan,'')
data.insert(0, 'ID', range(0, len(data)))
# Testcase 1
data_1 = data[(data['FirstName'] == data['SRFirstName'])]
ids = data_1.index.tolist()
for i in ids:
data.at[i,'Test Case Failed']+=', 1'
# There are 15 more test cases that preform similar tasks
# Total cases
failed = data[(data['Test Case Failed'] != '')]
passed = data[(data['Test Case Failed'] == '')]
failed['Test Case Failed'] =failed['Test Case Failed'].str[1:]
failed = failed[(failed['Test Case Failed'] != '')]
# Clean up
del failed["ID"]
del passed["ID"]
# Print results
failed['Test Case Failed'].value_counts()
print("There was a total of",data.shape[0], "rows.", "There was" ,data.shape[0] - failed.shape[0], "rows passed and" ,failed.shape[0], "rows failed at least one test case")
# Drop unwanted columns
redata = passed.drop(columns=['ConsCodeImpID', 'ImportID', 'Suff1', 'SRSuff2', 'Inactive',
'AddrRegion','AddrImpID', 'AddrImpID', 'AddrImpID.2', 'AddrImpID.1', 'PhoneAddrImpID',
'PhoneAddrImpID.1', 'PhoneImpID', 'PhoneAddrImpID', 'PhoneImpID', 'PhoneType.1', 'DateTo',
'SecondID', 'Test Case Failed', 'PhoneImpID.1'])
# Clean address
redata['AddrLines'] = redata['AddrLines'].str.replace('Apartment ','Apt ',regex=True)
redata['AddrLines'] = redata['AddrLines'].str.replace('Apt\\.','Apt ',regex=True)
redata['AddrLines'] = redata['AddrLines'].str.replace('APT','Apt ',regex=True)
redata['AddrLines'] = redata['AddrLines'].str.replace('nApt','Apt ',regex=True)
#There's about 100 more rows of address clean up
# Output edited dropped columns
redata.to_csv("C:/Users/cleandata.csv", index = False)
# Output failed rows
failed.to_csv("C:/Users/Failed.csv", index = False)
# Output passed rows
passed.to_csv("C:/Users/Passed.csv", index = False)
The workstation was corrupting the file, despite never opening it before running the script. I repaired the file and it worked. After reinstalling Excel, I no longer had to repair the file and could run the script as normal.