Search code examples
pythonpython-2.7transposeflat-file

Transpose Census Flat Files in Python


I'm trying to transpose this US Census flat file: http://www2.census.gov/govs/retire/2013indiv_unit_reported_data.txt in Python.

In the first column, the first 14 characters represent a row, and the last three represent a column. The second column is the value of that column and row. Can't seem to figure out a good way to make this into a table using Python.

Side Note: My end goal is to create a script that automatically imports these sort of files into ArcGIS, that's why I'm trying to do this in Python.


Solution

  • Although you could do this in pure Python too, using pandas would make this a very simple problem, because it's a pivot operation:

    df = pd.read_csv("2013indiv_unit_reported_data.txt", delim_whitespace=True, 
                     names=["rowcol", "data"])
    df["row"] = df["rowcol"].str[:14]
    df["col"] = df["rowcol"].str[14:]
    df_new = df.pivot(index="row", columns="col", values="data")
    df_new = df_new.fillna("")
    df_new.to_csv("table.dat", index=False)
    

    which produces a DataFrame whose top left corner looks like

    >>> df_new.iloc[:5,:5]
    col                 V87           X01          X02          X04           X05
    row                                                                          
    01000000003401  0131312  139748131312  82075131312               213456131312
    01000000003402  01313NR  474241131312      01313NR               627892131312
    01000000003403  01313NR       01313NR   3677131312                    0131312
    01000000003701  01313NR     578131312      01313NR                 3309131312
    01103703710000            122741313NR               119541313NR    27761313NR
    

    and an output data file looking like

    >>> !head table.dat
    V87,X01,X02,X04,X05,X06,X08,X11,X12,X21,X30,X33,X35,X42,X44,X46,X47,Z01,Z02,Z03,Z04,Z05,Z13,Z14,Z15,Z16,Z62,Z63,Z68,Z70,Z71,Z72,Z73,Z75,Z76,Z77,Z78,Z81,Z82,Z83,Z84,Z87,Z88,Z89,Z91,Z93,Z96,Z98,Z99
    0131312,139748131312,82075131312,,213456131312,125363131312,1294714131312,895475131312,44837131312,393606131312,0131312,0131312,0131312,0131312,1309366131312,955067131312,3333131312,84169131312,10554131312,35773131312,3826131312,3498131312,780456131312,87838131312,27181131312,0131312,0131312,2266097131312,389145131312,1309366131312,172000131312,138000131312,0131312,53844131312,30325131312,2266097131312,5056820131312,9984289131312,958400131312,0131312,0131312,0131312,4461131312,0131312,01313NR,9767131312,984714131312,0131312,125363131312
    01313NR,474241131312,01313NR,,627892131312,0131312,27384181313NR,1893321131312,55891131312,404296131312,932401131312,219743131312,01313NR,01313NR,29514461313NR,1963274131312,01313NR,133791131312,18568131312,69259131312,4990131312,4121131312,1720307131312,119270131312,53744131312,0131312,61902131312,3830519131312,378156131312,2951446131312,334155131312,304611131312,9006131312,01313NR,1337911313NR,38305191313NR,10514970131312,20596906131312,1963274131312,01313NR,01313NR,01313NR,26140131312,650756131312,01313NR,34803131312,2090646131312,01313NR,01313NR
    

    If you really wanted to do it manually, something like this should work:

    with open("2013indiv_unit_reported_data.txt") as fp:
        all_data = {}
        for line in fp:
            rowcol, data = line.split()
            row, col = rowcol[:14], rowcol[14:]
            all_data[row, col] = data
    
    import csv
    rows, cols = [sorted({key[i] for key in all_data}) for i in range(2)]
    with open("table2.dat", "wb") as fp: # python 2
        writer = csv.writer(fp)
        writer.writerow(cols)
        for row in rows:
            line = [all_data.get((row, col), '') for col in cols]
            writer.writerow(line)