My code is reading the header of a csv file and converting that to a lookup table of column_name=>column_index:
class CSVOutput:
def __init__(self, csv_file, required_columns):
csv_reader = csv.reader(csv_file)
# Construct lookup table for header
self.header = {}
for idx, column in enumerate(next(csv_reader)):
print(f"{column.lower().strip()} == key: {column.lower().strip() == 'key'}")
print(f"{column.lower().strip()} is key: {column.lower().strip() is 'key'}")
self.header[column.lower().strip()] = idx
print(self.header)
# Load the row data into memory/index it against key
key_idx = self.header['key']
with open("test.csv") as csv_file:
data = CSVOutput(csv_file, {})
When I run this, I get the following output and error:
{'key': 0, 'col1': 1, 'col2': 2}
key == key: False
key is key: False
col1 == key: False
col1 is key: False
col2 == key: False
col2 is key: False
Traceback (most recent call last):
File "D:\compare.py", line 74, in <module>
actual_data = CSVOutput(act_csv, required_columns)
File "D:\compare.py", line 40, in __init__
key_idx = self.header['key']
KeyError: 'key'
Basically there seems to be an inequivalence between the literal 'key' and the 'key' that's loaded from the file. I've tried looking at the source file in notepad++ with show all symbols on, but I'm not seeing any difference. I've also just had a look at the csv file in a hex editor and I can see the start looks like this: Key,  being EF BB BF. I'm not sure if that's the source of my problem, but if it is, why isn't strip() getting rid of it, and how do I handle that?
Any ideas?
EF BB BF
This is UTF-8 BOM, you might use utf-8-sig
encoding to deal with such files. Use encoding
of open
function following way
with open("test.csv",encoding="utf-8-sig") as csv_file: