Search code examples
pythontabular

Extracting only interesting columns from ASCII table


I'm by no means a programmer but I stumbled over a really nasty fixed width ASCII table which might require me to become one :) (with some help from you guys I hope)

I did already ask Mr. Google for some advice and he pointed my in the direction of Python. So here I am - pretty lost :(

The offending table looks like this:

column1 column2 column3 column4 column5 column6 column7 ... columnN
   data    crap    crap    data    crap    crap   data
   data    crap    crap    data    crap    crap   data
   data    crap    crap    data    crap    crap   
   data            crap            crap    crap   
   data    crap    crap    data    crap    crap   data
   data    crap    crap    data    crap    crap   data
   data    crap    crap            crap    crap   data
   data    crap    crap    data    crap           data
   data    crap    crap    data    crap    crap   data
   data    crap    crap    data    crap    crap   data

As you can see the number of columns can vary and there are portions in the table which have no data and there are also columns which have data I"m not interested in.

My goal is to have a table at the end which looks like this:

column1 column4 column7 ... columnN
   data   data    data
   data   data    data
   data   data       
   data           
   data   data    data
   data   data    data
   data           data
   data   data    data
   data   data    data
   data   data    data

So, now all the columns I don't want are gone. That's basically my goal - a table which has only the columns I'm interested in. Do you think something like that can be done in Python?


Solution

  • It sounds like you're trying to read table information from a text file, and then re-format it. Some basic processing might look like:

    # First read content into an array
    # Each item in the array will be a line of the file
    with open('filename.txt') as f:
        content = f.readlines()
    
    # Next, parse each line
    data = []
    for line in content:
        # You might need to split by spaces
        # This takes care of multiple whitespaces, so "data1   data2 data3    data4"
        # Becomes ['data1','data2','data3','data4']
        row = line.split()
        # Or, maybe you will need to split the row up by tabs into an array
        # [] is a list comprehension, strip() will remove extra whitespace
        row = [item.strip() for item in line.split('\t')]
        # Finally, append the row to your data array
        data.append(row)
    
    # Now, print the data back to a file how you'd like
    fout = open('output.txt','w')
    for row in data:
       # For specific columns
       fout.write('{0} {1} {2} {3}'.format(row[0],row[1],row[7],row[8]))
       # Or, if you just need to remove a couple columns, you might do:
       row.pop(6)
       row.pop(5)
       row.pop(4)
       fout.write(' '.join(row))