Search code examples
pythonpandascsvexport-to-csv

converting a text file into separate CSV files according to its specifications


I have a text file as follows.

Wednesday,01,September,2021
Wednesday,01,September,2021,00:15
cod,10,1=0,2=2
cod,18,1=27,2=18,3=20
Wednesday,01,September,2021
Wednesday,01,September,2021,00:30
cod,10,1=5,2=6
cod,18,1=20,2=19,3=15

I want to make separate CSV files as output in each of which the row is cod10 and cod18 respectively and the column labels are the keys (1=0 in cod,10 means the row name is cod10, the column name is 1 and the value is 0). First CSV file:

id      1   2
cod10   0   2
cod10   5   6

Second CSV file:

id      1   2   3
cod18   27  18  20
cod18   20  19  15

I would be thankful for any idea.


Solution

  • You could open and read the text file and save only the rows that start with "cod". For those rows, you split on "=" and save the result in a nested dictionary:

    with open('file.txt') as f:
        lines = f.read().split('\n')
        out = {}
        for line in lines:
            first, id_num, *splits = line.split(',')
            if first == 'cod':
                for s in splits:
                    k, v = s.split('=')
                    out.setdefault(first + id_num, {}).setdefault(int(k), []).append(v)
            
    

    Then this produces a dictionary that looks like:

    {'cod10': {1: ['0', '5'], 2: ['2', '6']}, 
     'cod18': {1: ['27', '20'], 2: ['18', '19'], 3: ['20', '15']}}
    

    Now since you tagged this as pandas, you could use pandas to create a DataFrame object for each key-value pair of out (produced above) and then save the values at each id as a csv as below. The main job here is to explode the columns (since each cell houses a list):

    for k, v in out.items():
        df = pd.DataFrame.from_dict({k:v}, orient='index').rename_axis(index=['id'])
        df = df.explode(df.columns.tolist()).reset_index()
        df.to_csv(f'file_{k}.csv')
        print(df, end='\n\n\n')
    

    Output:

          id  1  2
    0  cod10  0  2
    1  cod10  5  6
    
    
          id   1   2   3
    0  cod18  27  18  20
    1  cod18  20  19  15