So I am trying to teach myself python and pymarc for a school project I am working on. I have a sample marc file and I am trying to read it using this simple code:
from pymarc import *
reader = MARCReader(open('dump.mrc', 'rb'), to_unicode=True)
for record in reader:
print(record)
The for loop is to just print out each record to make sure I am getting the correct data. The only thing is I am getting this error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)
I've looked online but could not find an answer to my problem. What does this error mean and how can I go about fixing it? Thanks in advance.
You can set the python environment to support UTF-8 and get record as a dictionary.
Try:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
from pymarc import *
reader = MARCReader(open('dump.mrc', 'rb'), to_unicode=True, force_utf8=True)
for record in reader:
print record.as_dict()
Note:
If you still get the unicode exception, you can set to_unicode=False and skip force_utf8=True.
Also please check if your dump.mrc file is encoded to UTF-8 or not. Try: $ chardet dump.mrc