Search code examples
pythongeospatialpysal

Saving full QUEEN neighbor array to CSV in Pysal (Census Block Groups)


Let me start by saying, other than a stata .do file and some MCMC in R, I haven't coded since an AOL warez group application in VB; so I apologize for being here messin with y'all to begin with.

I am writing an environmental justice paper working with demographic and exposure data at the census block group level. Because block groups can be relatively small in size, a pollution source and the people living in one block group can easily impact AT LEAST first order neighbors. I was going to be a dullard and just aggregate back up through the FIPS code, but that is just bad math.

Got the shape file for ACS year I want, tried arcGIS first but I was not getting anywhere. I then read about Pysal and installed that

imported shape file ran the (12 hours) queen neighbor analysis on all 216,000 block groups

In [52]: w.histogram Out[52]:

[(0, 87), (1, 709), (2, 3634), (3, 16627), (4, 48736), (5, 56952), (6, 42848), (7, 24878), (8, 12646), (9, 6294), (10, 3040), (11, 1515), (12, 759), (13, 432), (14, 233), (15, 128), (16, 85), (17, 44), (18, 34), (19, 20), (20, 21), (21, 13), (22, 8), (23, 7), (24, 6), (25, 1), (26, 3), (27, 1), (28, 2), (29, 1), (30, 2), (31, 1), (32, 0), (33, 2), (34, 0), (35, 1), (36, 1), (37, 1), (38, 0), (39, 0), (40, 0), (41, 0), (42, 0), (43, 0), (44, 0), (45, 0), (46, 1), (47, 0), (48, 0), (49, 0), (50, 0), (51, 0), (52, 0), (53, 0), (54, 0), (55, 0), (56, 0), (57, 0), (58, 0), (59, 0), (60, 0), (61, 1)]

What I need is a .csv (or honestly anything will do if I copy/paste it somewhere) that enumerates each block group by FIPS (which should be what the ACS shapefile uses for ID) and it's list of neighbors.

If I can get the list, I can move it over to an environment where I am more comfortable. I sat there and played with it for hours last night and could get a couple cracks at numpy.savetext to work, but it was only a single column and the numbers were stored in scientific notation because FIPS codes are 12 digits. One time it told me the tuple was out of range, and I think that was the closest I got

I searched for just the data itself rather extensively before hand, or I promise I would not be here wasting your time.

Thank you, Dave


Solution

  • You can write the W to a txt file with pysal. There are a number of different formats, but a "GAL" file is the simplest.

    It's a txt file, the first line is the number of shapes. Each record is then 2 lines,

    id n
    id0, id1, ... 
    
    where:
      id is the id of the polygon,
      n is the number of neighbors
      id0 is the id of the first neighbor
      ... and so on
    

    For example:

     3
     0 1
     1
     1 2
     0 2
     2 1
     1
    

    ...describes the graph 0-1-2, 0 has 1 neighbor (1), 1 has 2 neighbors (0, 2), and so on.

    To write your W to a gal file...

    >>> W = pysal.queen_from_shapefile("/path/to/shapefile.shp")
    >>> out = pysal.open("output.gal", 'w')
    >>> out.write(W)
    >>> out.close()
    

    Note: The ids are offsets. 0 is the first polygon, 1 is the second, and so on.

    If you want to link the offsets to FIPS code you'd need to do this on your own. But, you can use pysal to extract the FIPS codes in the correct order..

    >>> dbf = pysal.open("/path/to/shapefile.dbf", "r")
    >>> print dbf.header
    [column names, ... ]
    >>> FIPS = dbf.by_col("name_of_fips_code_column")
    >>> FIPS = map(str, FIPS) #make sure we're writing strings
    >>> out = open('fips.txt','w')
    >>> out.write('\n'.join(FIPS))
    >>> out.close()