Reading a CSV-file with uncomplete headers

I have not used the csv module in python before, but it seems like a great tool to use for my needs.

The problem I am facing is that my csv-file looks a bit strange. (it was hard to notice at first)

See the example below for what I mean:

A|B|C|D|x|F|G|x|H
1||2||3|4||5|6
1||2||3|4||5|6
1||2||3|4||5|6
1||2||3|4||5|6

There are some indexes (such as column-x) above which should be read as Dx and Gx. The D-column and G-column are then empty columns.

So what I need is that when csv is processing the header it should create the headers like Dx and Gx and neglect the D and G columns and take the value from the x-column instead.

(I tried to read the file as is with the following code first and post process it, but it is not possible since the column name x is always the same, and csv.DictReader seems to discard repetitive column-names)

with open('myCSV.log') as csvFile:
    reader = csv.DictReader(csvFile, dialect='pipes')
    for row in reader:
        print row

Can this be achieved with a dialect, or some other fancy way of configuring the csv-module or do I have to parse this myself?

Solution

Use pandas:

import pandas as pd
print pd.read_csv('test.csv',sep='|')

out:

   A   B  C   D  x  F   G  x.1  H
0  1 NaN  2 NaN  3  4 NaN    5  6
1  1 NaN  2 NaN  3  4 NaN    5  6
2  1 NaN  2 NaN  3  4 NaN    5  6
3  1 NaN  2 NaN  3  4 NaN    5  6

and if you need you can convert it to a dict using to_dict() method:

{'A': {0: 1, 1: 1, 2: 1, 3: 1},
 'B': {0: nan, 1: nan, 2: nan, 3: nan},
 'C': {0: 2, 1: 2, 2: 2, 3: 2},
 'D': {0: nan, 1: nan, 2: nan, 3: nan},
 'F': {0: 4, 1: 4, 2: 4, 3: 4},
 'G': {0: nan, 1: nan, 2: nan, 3: nan},
 'H': {0: 6, 1: 6, 2: 6, 3: 6},
 'x': {0: 3, 1: 3, 2: 3, 3: 3},
 'x.1': {0: 5, 1: 5, 2: 5, 3: 5}}

EDIT: If you need certain names for columns you can do this:

import pandas as pd
df = pd.read_csv('test.csv',sep='|')
df.columns = [df.columns[index-1]+'x' if 'x' in name 
              else name for index,name in enumerate(df.columns)]
print df

   A   B  C   D  Dx  F   G  Gx  H
0  1 NaN  2 NaN   3  4 NaN   5  6
1  1 NaN  2 NaN   3  4 NaN   5  6
2  1 NaN  2 NaN   3  4 NaN   5  6
3  1 NaN  2 NaN   3  4 NaN   5  6

If you want to lose the empty cols:

print df.dropna(axis=1,how='all')

   A  C  Dx  F  Gx  H
0  1  2   3  4   5  6
1  1  2   3  4   5  6
2  1  2   3  4   5  6
3  1  2   3  4   5  6