I'm trying to transpose this US Census flat file: http://www2.census.gov/govs/retire/2013indiv_unit_reported_data.txt in Python.
In the first column, the first 14 characters represent a row, and the last three represent a column. The second column is the value of that column and row. Can't seem to figure out a good way to make this into a table using Python.
Side Note: My end goal is to create a script that automatically imports these sort of files into ArcGIS, that's why I'm trying to do this in Python.
Although you could do this in pure Python too, using pandas
would make this a very simple problem, because it's a pivot operation:
df = pd.read_csv("2013indiv_unit_reported_data.txt", delim_whitespace=True,
names=["rowcol", "data"])
df["row"] = df["rowcol"].str[:14]
df["col"] = df["rowcol"].str[14:]
df_new = df.pivot(index="row", columns="col", values="data")
df_new = df_new.fillna("")
df_new.to_csv("table.dat", index=False)
which produces a DataFrame whose top left corner looks like
>>> df_new.iloc[:5,:5]
col V87 X01 X02 X04 X05
row
01000000003401 0131312 139748131312 82075131312 213456131312
01000000003402 01313NR 474241131312 01313NR 627892131312
01000000003403 01313NR 01313NR 3677131312 0131312
01000000003701 01313NR 578131312 01313NR 3309131312
01103703710000 122741313NR 119541313NR 27761313NR
and an output data file looking like
>>> !head table.dat
V87,X01,X02,X04,X05,X06,X08,X11,X12,X21,X30,X33,X35,X42,X44,X46,X47,Z01,Z02,Z03,Z04,Z05,Z13,Z14,Z15,Z16,Z62,Z63,Z68,Z70,Z71,Z72,Z73,Z75,Z76,Z77,Z78,Z81,Z82,Z83,Z84,Z87,Z88,Z89,Z91,Z93,Z96,Z98,Z99
0131312,139748131312,82075131312,,213456131312,125363131312,1294714131312,895475131312,44837131312,393606131312,0131312,0131312,0131312,0131312,1309366131312,955067131312,3333131312,84169131312,10554131312,35773131312,3826131312,3498131312,780456131312,87838131312,27181131312,0131312,0131312,2266097131312,389145131312,1309366131312,172000131312,138000131312,0131312,53844131312,30325131312,2266097131312,5056820131312,9984289131312,958400131312,0131312,0131312,0131312,4461131312,0131312,01313NR,9767131312,984714131312,0131312,125363131312
01313NR,474241131312,01313NR,,627892131312,0131312,27384181313NR,1893321131312,55891131312,404296131312,932401131312,219743131312,01313NR,01313NR,29514461313NR,1963274131312,01313NR,133791131312,18568131312,69259131312,4990131312,4121131312,1720307131312,119270131312,53744131312,0131312,61902131312,3830519131312,378156131312,2951446131312,334155131312,304611131312,9006131312,01313NR,1337911313NR,38305191313NR,10514970131312,20596906131312,1963274131312,01313NR,01313NR,01313NR,26140131312,650756131312,01313NR,34803131312,2090646131312,01313NR,01313NR
If you really wanted to do it manually, something like this should work:
with open("2013indiv_unit_reported_data.txt") as fp:
all_data = {}
for line in fp:
rowcol, data = line.split()
row, col = rowcol[:14], rowcol[14:]
all_data[row, col] = data
import csv
rows, cols = [sorted({key[i] for key in all_data}) for i in range(2)]
with open("table2.dat", "wb") as fp: # python 2
writer = csv.writer(fp)
writer.writerow(cols)
for row in rows:
line = [all_data.get((row, col), '') for col in cols]
writer.writerow(line)