Search code examples
python-2.7arcgis

Pulling French characters from a csv file and updating a featureclass with them (ArcGIS 10.4 & Python 2.7.10)


I'll post my code below. I've been trying to create an automatic update script to create a civic address featureclass in a file geodatabase, and the script functions as intended aside from the final step: I'm attempting to add a concatenated field that includes the street name and the street title (Street, Road, Rue, etc.) based on a "before or after" flag in a different field (1 for before street name, 2 for after), but I appear to be getting Unicode errors. I'm relatively new to python so I'm not well versed in using different Unicode settings. I've tried including:

# -*- coding: utf-8 -*-

as the very first line of the code but to no avail. The error I receive is the following:

Traceback (most recent call last): File "P:\AFT\Sept2018\CADB_update_working\CADB_update\CADB_updateScript_test_complete_a_test_a.py", line 252, in for row in cursor: UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 3: unexpected end of data

I fully expect this to be some simple typo or syntax error that I'm not catching, or maybe some flaw in my generated csv, which is generated from a txt file. The problematic section of code is below:

# uses txt file to write a csv file
txtFile = inFolder + "\\street_types.txt"
csvFile = inFolder + "\\street_types.csv"

with open(txtFile, 'rb') as inFile, open(csvFile, 'wb') as outFile:
    in_txt = csv.reader(inFile, delimiter = '\t')
    out_csv = csv.writer(outFile)
    out_csv.writerows(in_txt)

print "CSV created" 

# writes two columns of the csv into 2 lists and then combines them into a dictionary              
with open(csvFile,'r') as csvFile:
    reader = csv.reader(csvFile, delimiter=',')
    next(reader, None)
    listA = [] # CD
    listB = [] # Display Before Flag
    listC = [] # NAME
    for row in reader:
        listA.append(row[0])
        listB.append(row[3])
        listC.append(row[1])


    # print listA
    # print listB
    keys = map(int, listA)
    values = map(int, listB)
    dictionary = dict(zip(keys,values))
    print dictionary

    keysB = map(int, listA)
    valuesB = listC
    dictionaryB = dict(zip(keysB,valuesB))
    print dictionaryB

# uses that dictionary to update the field just added to teh feature class with the corresponding boolean value
print "Dictionaries made successfully"
update_fields = ["ST_TYPE_CD","ST_NAME_AFTER_TYPE"]
with arcpy.da.UpdateCursor(fc, update_fields) as cursor:
    for row in cursor:
        if row[0] in dictionary:
            row[1] = dictionary[row[0]]
            cursor.updateRow(row)

# Adding more fields to hold the concatenated ST_TYPE_CD and STREET_NAME based on ST_NAME_AFTER_TYPE
field_name = "ST_NAME_COMPLETE"
if arcpy.ListFields(fc, field_name):
    print "Field to be added already exists"
else:
    arcpy.AddField_management(fc, "ST_NAME_COMPLETE", "TEXT")
    print "Field added"

field_name = "ST_TYPE"
if arcpy.ListFields(fc, field_name):
    print "Field to be added already exists"
else:
    arcpy.AddField_management(fc, "ST_TYPE", "TEXT")
    print "Field added"

# Populating those added fields
fields = ["ST_TYPE_CD","ST_TYPE","STREET_NAME"]
where = "STREET_NAME IS NOT NULL"
with arcpy.da.UpdateCursor(fc, fields, where) as cursor:
    for row in cursor:
        if row[0] in dictionaryB:
            row[1] = dictionaryB[row[0]]
        cursor.updateRow(row)
print "One of two field transcriptions complete"

fields = ["ST_TYPE","STREET_NAME","ST_NAME_COMPLETE","ST_NAME_AFTER_TYPE"]
with arcpy.da.UpdateCursor(fc, fields, where) as cursor:
    for row in cursor:
        if row[3] == 1:
            row[2] = row[0] + " " + row[1]
        elif row[3] == 2:
            row[2] = row[1] + " " + row[0]
        cursor.updateRow(row)
print "Two of two field transcriptions complete"

If the csv file is expected to be the problem I can attempt to upload that or display a snippet of the data included.

I've been stuck on this for a while so any help or advice would be appreciated.


Solution

  • As was commented above, the fix to the problem was to change

    listC.append(row[1]) 
    

    to

    listC.append(row[1].decode('cp1252'))
    

    This converted the list values from strings to Unicode strings ( such as u'string') which allowed the processes that followed to interpret the unicode characters properly.