Transpose Census Flat Files in Python

I'm trying to transpose this US Census flat file: http://www2.census.gov/govs/retire/2013indiv_unit_reported_data.txt in Python.

In the first column, the first 14 characters represent a row, and the last three represent a column. The second column is the value of that column and row. Can't seem to figure out a good way to make this into a table using Python.

Side Note: My end goal is to create a script that automatically imports these sort of files into ArcGIS, that's why I'm trying to do this in Python.

Solution

Although you could do this in pure Python too, using pandas would make this a very simple problem, because it's a pivot operation:

df = pd.read_csv("2013indiv_unit_reported_data.txt", delim_whitespace=True, 
                 names=["rowcol", "data"])
df["row"] = df["rowcol"].str[:14]
df["col"] = df["rowcol"].str[14:]
df_new = df.pivot(index="row", columns="col", values="data")
df_new = df_new.fillna("")
df_new.to_csv("table.dat", index=False)

which produces a DataFrame whose top left corner looks like

>>> df_new.iloc[:5,:5]
col                 V87           X01          X02          X04           X05
row                                                                          
01000000003401  0131312  139748131312  82075131312               213456131312
01000000003402  01313NR  474241131312      01313NR               627892131312
01000000003403  01313NR       01313NR   3677131312                    0131312
01000000003701  01313NR     578131312      01313NR                 3309131312
01103703710000            122741313NR               119541313NR    27761313NR

and an output data file looking like

>>> !head table.dat
V87,X01,X02,X04,X05,X06,X08,X11,X12,X21,X30,X33,X35,X42,X44,X46,X47,Z01,Z02,Z03,Z04,Z05,Z13,Z14,Z15,Z16,Z62,Z63,Z68,Z70,Z71,Z72,Z73,Z75,Z76,Z77,Z78,Z81,Z82,Z83,Z84,Z87,Z88,Z89,Z91,Z93,Z96,Z98,Z99
0131312,139748131312,82075131312,,213456131312,125363131312,1294714131312,895475131312,44837131312,393606131312,0131312,0131312,0131312,0131312,1309366131312,955067131312,3333131312,84169131312,10554131312,35773131312,3826131312,3498131312,780456131312,87838131312,27181131312,0131312,0131312,2266097131312,389145131312,1309366131312,172000131312,138000131312,0131312,53844131312,30325131312,2266097131312,5056820131312,9984289131312,958400131312,0131312,0131312,0131312,4461131312,0131312,01313NR,9767131312,984714131312,0131312,125363131312
01313NR,474241131312,01313NR,,627892131312,0131312,27384181313NR,1893321131312,55891131312,404296131312,932401131312,219743131312,01313NR,01313NR,29514461313NR,1963274131312,01313NR,133791131312,18568131312,69259131312,4990131312,4121131312,1720307131312,119270131312,53744131312,0131312,61902131312,3830519131312,378156131312,2951446131312,334155131312,304611131312,9006131312,01313NR,1337911313NR,38305191313NR,10514970131312,20596906131312,1963274131312,01313NR,01313NR,01313NR,26140131312,650756131312,01313NR,34803131312,2090646131312,01313NR,01313NR

If you really wanted to do it manually, something like this should work:

with open("2013indiv_unit_reported_data.txt") as fp:
    all_data = {}
    for line in fp:
        rowcol, data = line.split()
        row, col = rowcol[:14], rowcol[14:]
        all_data[row, col] = data

import csv
rows, cols = [sorted({key[i] for key in all_data}) for i in range(2)]
with open("table2.dat", "wb") as fp: # python 2
    writer = csv.writer(fp)
    writer.writerow(cols)
    for row in rows:
        line = [all_data.get((row, col), '') for col in cols]
        writer.writerow(line)