unicode encoding character shapefile census

Character encoding for US Census Cartographic Boundary Files

I'm trying to import the US Census cartographic boundary files (available here: http://www.census.gov/geo/www/cob/bdy_files.html ) into a GeoDjango application. However, python is complaining about UnicodeDecodeErrors (for example, for the non-ascii characters in Puerto Rico).

The shapefile description file (*.dbf) doesn't specify what character encoding it uses; this is not defined by the spec for shapefiles. What is the correct character encoding to use?

Solution

The US Census cartographic boundary files use the IBM850 character encoding. Python code to properly encode these strings would be as follows:

unicode(featurestring.decode("IBM850"))