Search code examples
unicodeencodingcharactershapefilecensus

Character encoding for US Census Cartographic Boundary Files


I'm trying to import the US Census cartographic boundary files (available here: http://www.census.gov/geo/www/cob/bdy_files.html ) into a GeoDjango application. However, python is complaining about UnicodeDecodeErrors (for example, for the non-ascii characters in Puerto Rico).

The shapefile description file (*.dbf) doesn't specify what character encoding it uses; this is not defined by the spec for shapefiles. What is the correct character encoding to use?


Solution

  • The US Census cartographic boundary files use the IBM850 character encoding. Python code to properly encode these strings would be as follows:

    unicode(featurestring.decode("IBM850"))