python python-3.x pandas dataframe text-files

Opening textfile in python

The structure of the data file looks like this:

  2.0  0    3    9.15400
      5.40189    0.77828    0.66432
      0.44219    0.00000
  2.0  0    1    9.15400
      0.00000
  2.0  0    6    9.15400
      7.38451    3.99120    2.23459    1.49781    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.65828
      0.58990    0.00000

and so on

I want to create a data frame that should look like this:

9.15400    5.40189    0.77828    0.66432    0.44219    0.00000
9.15400    0.00000
9.15400    7.38451    3.99120    2.23459    1.49781    0.77828    0.000
9.15400    2.09559    0.77828    0.00000
9.15400    2.09559    0.77828    0.65828    0.58990    0.00000

Can someone please help me how I can get started with this?

Solution

Why doesn't the 0th row of your desired dataframe include 0.66432?

It's unclear how the table is structured. If it is as unstructured as is shown in your question, try this:

Input:

  2.0  0    3    9.15400
      5.40189    0.77828    0.66432
      0.44219    0.00000
  2.0  0    1    9.15400
      0.00000
  2.0  0    6    9.15400
      7.38451    3.99120    2.23459    1.49781    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.65828
      0.58990    0.00000


filename = r"C:\Users\Bobson Dugnutt\Desktop\table2.txt"

# This returns a dataframe with a single column
df = pd.read_table(filename, header=None)

# Split the bad-boy at every double space
df = df[0].str.split("  ", expand=True)

new_rows = []
values = []

for row in df.itertuples():
    # row[2] is the column which is either 2.0 or blank ("")
    if row[2]:
        if values:
            new_rows.append(values)
            values = []
        
        # row[7] is the column with a value like 9.15400
        values.append(row[7])
    else:
        # Add all non-blank values starting from the the 4th column.
        # The 4th column is the first column meaningful values are 
        # found for these rows
        values.extend(value for value in row[4:] if value)
    
new_rows.append(values)

# fillna("") so make the NaN values blank
new_df = pd.DataFrame(new_rows).fillna("")
print(new_df)

Output:

        0       1       2       3       4       5       6
0 9.15400 5.40189 0.77828 0.66432 0.44219 0.00000        
1 9.15400 0.00000                                        
2 9.15400 7.38451 3.99120 2.23459 1.49781 0.77828 0.00000
3 9.15400 2.09559 0.77828 0.00000                        
4 9.15400 2.09559 0.77828 0.65828 0.58990 0.00000

Or if each column is fixed width, and is just not displayed correctly in your question, try this:

Input:

  2.0  0    3    9.15400
                 5.40189    0.77828    0.66432
                 0.44219    0.00000
  2.0  0    1    9.15400
                 0.00000
  2.0  0    6    9.15400
                 7.38451    3.99120    2.23459    1.49781    0.77828    0.00000
  2.0  0    3    9.15400
                 2.09559    0.77828    0.00000
  2.0  0    3    9.15400
                 2.09559    0.77828    0.65828
                 0.58990    0.00000

import pandas as pd

# Make the floats the same width of your desired output
pd.options.display.float_format = '{:.5f}'.format

filename = r"C:\Users\Bobson Dugnutt\Desktop\table.txt"

df = pd.read_fwf(filename, header=None).fillna("")

new_rows = []
values = []

for row in df.itertuples():
    # row[1] is the column which is either 2.0 or blank ("")
    if row[1] != "":
        if values:
            new_rows.append(values)
            values = []
        
        # row[4] is the column with a value like 9.15400
        values.append(row[4])
    else:
        # Add all non-blank values starting from the same column as above
        values.extend(value for value in row[4:] if value != "")

new_rows.append(values)

# fillna("") so make the NaN values blank
new_df = pd.DataFrame(new_rows).astype(float).fillna("")

print(new_df)

Output:

        0       1       2       3       4       5       6
0 9.15400 5.40189 0.77828 0.66432 0.44219 0.00000        
1 9.15400 0.00000                                        
2 9.15400 7.38451 3.99120 2.23459 1.49781 0.77828 0.00000
3 9.15400 2.09559 0.77828 0.00000                        
4 9.15400 2.09559 0.77828 0.65828 0.58990 0.00000

In both cases all you have to do is iterate over each row, check if one of the first columns is not blank (i.e. is 2.0), and if it is, then get all the other meaningful values in the next rows until you come across another similar row. The specific index differs depending on how the table was originally parsed.