Search code examples
pythonpandasdataframedictionaryarcpy

Why are there missing records when I convert from pandas df to dictionary?


I am trying to convert a DBF of about 3233 records created from a shapefile of US counties to a dataframe; then I want to take two of the columns from that dataframe and convert to a dictionary where column1 is the key and column2 is the value. However, the resulting dictionary doesn't have the same number of records as my dataframe.

  • I use arcpy to call in the shapefile for all US Counties. When I use arcpy.GetCount_management(county_shapefile), this returns a feature count of 3233 records.
  • In order to convert to a dataframe, I converted to a dbf first with arcpy.TableToTableconversion(), this returns a dbf with 3233 records.
  • After converting to a df using Dbf5 from simpledbf, I get a df with 3233 records.
  • I then convert the first two columns to a dictionary which returns 56 records. Can anyone tell me what's going on here? (I recently switched to Python 3 from Python 2, could that be part of the issue?)

Code:

county_shapefile = "U:/Shapefiles/tl_2018_us_county/tl_2018_us_county.shp"
dbf = arcpy.TableToTable_conversion(county_shapefile,"U:/","county_data.dbf")

from simpledbf import Dbf5
dbfile = Dbf5(str(dbf))
df = dbfile.to_dataframe()

df_dict = {row[0]:row[1] for row in df.values}

I have also tried doing this with the .to_dict() function, but I'm not getting the desired dictionary structure {column1:column2,column1:column2...}

from simpledbf import Dbf5
dbfile=Dbf5(str(dbf))
df=dbfile.to_dataframe()
subset=df[["STATEFP","COUNTYFP"]]
subset=subset.set_index("COUNTYFP")
dict=subset.to_dict()

In the end, I'm hoping to create a dictionary where the key is the County FIPS code (COUNTYFP) and the value is the State FIPS code (STATEFP). I do not want to have any nested dictionaries, just a simple dictionary with the format...

dict={
   COUNTYFP1:STATEFP1,
   COUNTYFP2:STATEFP2,
   COUNTYFP3:STATEFP3,
   ....
}

Solution

  • Are you sure that the column1 has no duplicates? Because dictionaries in python do not support duplicate keys! If you want to preserve all the values in the column1 as keys you'll have to find a workaround for the same.