ParserError: Expected 2 fields in line 32, saw 4

I'm having trouble parsing a txt file (see here: File) Here's my code

import pandas as pd
objectname = r"path"
df = pd.read_csv(objectname, engine = 'python',  sep='\t', header=None)

Unfortunately it does not work. Since this question has been asked several times, I tried lots of proposed solutions (most of them can be found here: Possible solutions)

However, nothing did the trick for me. For instance, when I use

sep='delimiter'

The dataframe is created but everything ends up in a single column.

When I use

error_bad_lines=False

The rows I'm interested in are simply skipped.

The only way it works is when I first open the txt file, copy the content, paste it into google sheets, save the file as CSV and then open the dataframe.

I guess another workaround would be to use

df = pd.read_csv(objectname, engine = 'python',  sep = 'delimiter', header=None)

in combination with the split function Split function

Is there any suggestion how to make this work without the need to convert the file or to use the split function? I'm using Python 3 and Windows 10. Any help is appreciated.

Solution

Your file has tab separators but is not a TSV. The file is a mixture of metadata, followed by a "standard" TSV, followed by more metadata. Therefore, I found tackling the metadata as a separate task from loading the data to be useful.

Here's what I did to extract the metadata lines:

with open('example.txt','r') as file_handle:
    file_content = file_handle.read().split('\n')

for index, line in enumerate(file_content):
    if index<21 or index>37:
        print(index, line.split('\t'))

Note that the lines denoting the start and stop of metadata (21 and 37 in my example) are specific to the file. I've provided the trimmed data I used below (based on your linked file).

Separately, I loaded the TSV into Pandas using

import pandas as pd
df = pd.read_csv('example.txt', engine = 'python',  
                 sep='\t', error_bad_lines=False, header=None,
                 skiprows=list(range(21))+list(range(37,89)))

Again, I skipped the metadata at the start of the file and at the end of the file.

Here's the file I experimented with. I've trimmed the extra data to reduce line count.

TITLE   Test123
DATA TYPE   
ORIGIN  JASCO
OWNER   
DATE    19/03/28
TIME    16:39:44
SPECTROMETER/DATA SYSTEM    
LOCALE  1031
RESOLUTION  
DELTAX  -0,5
XUNITS  NANOMETERS
YUNITS  CD [mdeg]
Y2UNITS HT [V]
Y3UNITS ABSORBANCE
FIRSTX    300,0000
LASTX     190,0000
NPOINTS      221
FIRSTY      -0,78961
MAXY        37,26262
MINY       -53,38971
XYDATA
300,0000    -0,789606   182,198 -0,0205245
299,5000    -0,691644   182,461 -0,0181217
299,0000    -0,700976   182,801 -0,0136756
298,5000    -0,614708   182,799 -0,0131957
298,0000    -0,422611   182,783 -0,0130073
195,0000    26,6231 997,498 4,7258
194,5000    -17,3049    997,574 4,6864
194,0000    16,0387 997,765 4,63967
193,5000    -14,4049    997,967 4,58593
193,0000    -0,277261   998,025 4,52411
192,5000    -29,6098    998,047 4,45244
192,0000    -11,5786    998,097 4,36608
191,5000    34,0505 998,282 4,27376
191,0000    28,2325 998,314 4,1701
190,5000    -13,232 998,336 4,05036
190,0000    -47,023 998,419 3,91883

##### Extended Information
[Comments]
Sample name X
Comment
User
Division
Company RWTH Aachen

[Detailed Information]
Creation date   28.03.2019 16:39

Data array type Linear data array * 3
Horizontal axis Wavelength [nm]
Vertical axis(1)    CD [mdeg]
Vertical axis(2)    HT [V]
Vertical axis(3)    Abs
Start   300 nm
End 190 nm
Data interval   0,5 nm
Data points 221     

[Measurement Information]
Instrument name CD-Photometer
Model name  J-1100
Serial No.  A001361635

Detector    Standard PMT
  Lock-in amp.  X mode
  HT volt   Auto

Accessory   PTC-514
Accessory S/N   A000161648
  Temperature   18.63 C
  Control sonsor    Holder
  Monitor sensor    Holder

Measurement date    28.03.2019 16:39

Overload detect 203
Photometric mode    CD, HT, Abs
Measure range   300 - 190 nm
Data pitch  0.5 nm
CD scale    2000 mdeg/1.0 dOD
FL scale    200 mdeg/1.0 dOD
D.I.T.  0.5 sec
Bandwidth   1.00 nm
Start mode  Immediately
Scanning speed  200 nm/min
Baseline correction Baseline
Shutter control Auto
Accumulations   3