Search code examples
python-3.xextracttabular

Need to extract tabular data from a text file in python3


I have output from a Quantum Chemistry program from which I wish to extract tabular data for input into a Python port of a FORTRAN program I wrote about 25 years ago.

Some of the output files are rather long, as many as 6000 lines which precludes the use of a spreadsheet for processing.

A typical table is of the form:

                             CARTESIAN COORDINATES

   1    C        0.011987266    -0.003842185     0.006578784
   2    H        1.097152909    -0.003956163     0.013339310
   3    H       -0.349612312     1.019316731     0.001903075
   4    H       -0.344276148    -0.517463019    -0.880495291
   5    H       -0.355315644    -0.513266496     0.891567896

I'm not asking for someone to write the Python code for me, but rather give me some guidance thorough the labyrinth of available code.


Solution

  • I suggest you look into np.genfromtxt. The following code snippet will read the example data from your question stored in a file called data.txt.

    import numpy as np
    data = np.genfromtxt('data.txt', skip_header=2, dtype=[('id', 'i8'),('label','S1'),('x','f8'),('y','f8'),('z','f8')])
    print(data)
    

    Output

     [(1, b'C',  0.01198727, -0.00384219,  0.00657878)
     (2, b'H',  1.09715291, -0.00395616,  0.01333931)
     (3, b'H', -0.34961231,  1.01931673,  0.00190307)
     (4, b'H', -0.34427615, -0.51746302, -0.88049529)
     (5, b'H', -0.35531564, -0.5132665 ,  0.8915679 )]