Search code examples
pythonfiletextsplit

How to split a 2-column ASCII file data to a multicolumn data properly in Python?


I have a text file with a structure like:

0   1.23
1   2.76
2   2.46
3   6.23

0   1.33
1   2.57
2   2.87
3   5.34

.
.
.

I would like to arrange a new file like:

0   1.23   1.33  ...
1   2.76   2.57  ...
2   2.46   2.87  ...
3   6.23   5.34  ...

I can do it in a very primitive way with:

# Number of data group
numberofdatagroup = 5
# Number of data in each group
data = 4


arr = [[0 for col in range(2*numberofdatagroup)] for row in range(data)]
f = open(file, 'r')
lines = f.readlines()
f.close()
a=0
for i in range(0, numberofdatagroup, 1):
   b = 0
   for a in range (0, data, 1):
      fields = lines[a].split()
      arr[b][2*i] = fields[0]
      arr[b][2*i+1] = fields[1]
      b = b + 1
   a = a + 2

# writing to output file
f = open(output, 'w+')
stringline = ""

for i in range(0, data, 1):
  stringline = stringline + arr[i][0] + " " + arr[i][1] + " "
  for j in range(1, numberofdatagroup, 1):
     stringline = stringline + arr[i][2*j+1] + " "
  f.write(stringline + "\n")
  stringline = ""

f.close()

However, it is not always working. It is very sensible to empty lines. Is there any way to make it in a more clever way?


Solution

  • Here is an example how you could read the file into a Pandas DataFrame:

    import pandas as pd
    
    current, all_groups = [], []
    with open('data.txt', 'r') as f_in:
        for line in map(str.strip, f_in):
            if line == "" and current:
                all_groups.append(pd.DataFrame(current)[1])
                current = []
            else:
                current.append(line.split(maxsplit=1))
    
    if current:
        all_groups.append(pd.DataFrame(current)[1])
    
    final_df = pd.concat(all_groups, axis=1)
    final_df.columns = range(len(final_df.columns))
    
    print(final_df)
    

    Prints:

          0     1
    0  1.23  1.33
    1  2.76  2.57
    2  2.46  2.87
    3  6.23  5.34
    

    EDIT: Without pandas library:

    current, all_groups = [], []
    with open("data.txt", "r") as f_in:
        for line in map(str.strip, f_in):
            if line == "" and current:
                all_groups.append(current)
                current = []
            else:
                current.append(line.split(maxsplit=1))
    
    if current:
        all_groups.append(current)
    
    for g in zip(*all_groups):
        print('{} {} {}'.format(g[0][0], g[0][1], ' '.join(v for _, v in g[1:])))