I have a file that has about 3000 blocks repeated as follows (first two shown):
21
Profile. 1 HEAT OF FORMATION = -79.392 KCAL = -332.175 KJ
H -2.22728 -1.35263 1.32579
H 1.21425 -1.35263 1.32579
C 1.43878 0.44129 1.32579
O 2.25748 -0.52202 1.23773
C 0.12570 -0.10907 1.38542
H -0.47394 0.10034 2.26424
C -2.02530 -1.28825 -2.05204
C -0.80697 -0.63466 -2.22403
H -0.41632 -0.42983 -3.21532
H 0.84731 0.28355 -1.21782
C -0.09866 -0.24043 -1.09182
C -1.83256 -1.15779 0.32994
C -0.59706 -0.50055 0.19091
H -3.51151 -2.06378 -0.69513
C -2.55421 -1.55647 -0.78456
O -2.78665 -1.71220 -3.09841
H -2.37922 -1.48635 -3.96745
H 2.21062 3.22762 2.75985
C 1.91952 1.85374 1.37731
O 2.22890 2.54529 0.44919
O 1.92486 2.27899 2.65936
21
Profile. 2 HEAT OF FORMATION = -79.390 KCAL = -332.168 KJ
H -2.22728 -1.35263 1.32579
H 1.21674 -1.35282 1.32529
C 1.43862 0.44132 1.32582
O 2.25745 -0.52214 1.23772
C 0.12565 -0.10889 1.38540
H -0.47402 0.10051 2.26417
C -2.02530 -1.28825 -2.05204
C -0.80697 -0.63465 -2.22403
H -0.41632 -0.42983 -3.21531
H 0.84730 0.28355 -1.21782
C -0.09865 -0.24043 -1.09182
C -1.83256 -1.15780 0.32995
C -0.59702 -0.50058 0.19094
H -3.51151 -2.06378 -0.69513
C -2.55421 -1.55647 -0.78456
O -2.78666 -1.71220 -3.09841
H -2.37922 -1.48635 -3.96745
H 2.21061 3.22763 2.75985
C 1.91953 1.85373 1.37732
O 2.22890 2.54528 0.44919
O 1.92486 2.27898 2.65936
where each set of coordinates is separated by the number of atoms (21 in this case) and a heat of formation.
I'm wondering how to go about reading in and writing each set of coordinates to a separate array so that I can ultimately manipulate certain elements of these arrays.
As I suggested in the comment:
Read all the lines:
In [783]: with open('stack40730696.txt','rb') as f:
...: lines = f.readlines()
I could read line by line, block by block,etc. but it's easiest to play with a list.
Now read the 1st block. Looks like '21' is the number of data lines:
In [784]: i=0
In [785]: n=int(lines[i])
In [786]: n
Out[786]: 21
In [787]: i+=1
In [788]: block=lines[i:i+1+n] # grab the lines of a block, with header
In [789]: block[0]
Out[789]: b'Profile. 1 HEAT OF FORMATION = -79.392 KCAL = -332.175 KJ\n'
In [790]: block[-1]
Out[790]: b' O 1.92486 2.27899 2.65936\n'
Now load into an array with genfromtxt
; and check the result. I printed the whole thing, but here I'll just print a few details.
In [791]: data1=np.genfromtxt(block, skip_header=1,dtype=None)
In [792]: data1.shape
Out[792]: (21,)
In [793]: data1.dtype
Out[793]: dtype([('f0', 'S1'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8')])
Advance to the next block and repeat. Of course for the whole file I'd put this in a loop, and collect the data
arrays in a list.
In [794]: i=i+1+n
In [795]: n=int(lines[i])
In [796]: i+=1
In [797]: block=lines[i:i+1+n]
In [798]: data2=np.genfromtxt(block, skip_header=1,dtype=None)
In [799]: data2.shape
Out[799]: (21,)
In [800]: data2.dtype
Out[800]: dtype([('f0', 'S1'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8')])
a few records
In [802]: data2[:3]
Out[802]:
array([(b'H', -2.22728, -1.35263, 1.32579),
(b'H', 1.21674, -1.35282, 1.32529),
(b'C', 1.43862, 0.44132, 1.32582)],
dtype=[('f0', 'S1'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8')])
This is a structured array with one string field, and 3 float ones. This could be split into a string array and a (21,3) float array
In [803]: dataf=np.genfromtxt(block, skip_header=1,usecols=[1,2,3])
In [804]: dataf.shape
Out[804]: (21, 3)
In [805]: dataf.dtype
Out[805]: dtype('float64')