Search code examples
pythonasciimarc

Trouble reading MARC data using MARCReader() and pymarc


So I am trying to teach myself python and pymarc for a school project I am working on. I have a sample marc file and I am trying to read it using this simple code:

from pymarc import *

reader = MARCReader(open('dump.mrc', 'rb'), to_unicode=True)

for record in reader:
    print(record)

The for loop is to just print out each record to make sure I am getting the correct data. The only thing is I am getting this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

I've looked online but could not find an answer to my problem. What does this error mean and how can I go about fixing it? Thanks in advance.


Solution

  • You can set the python environment to support UTF-8 and get record as a dictionary.

    Try:

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    
    import sys
    
    reload(sys)
    sys.setdefaultencoding('utf-8')
    
    from pymarc import *
    
    reader = MARCReader(open('dump.mrc', 'rb'), to_unicode=True, force_utf8=True)
    for record in reader:
        print record.as_dict() 
    

    Note:

    1. If you still get the unicode exception, you can set to_unicode=False and skip force_utf8=True.

    2. Also please check if your dump.mrc file is encoded to UTF-8 or not. Try: $ chardet dump.mrc