I have a .dat file of coordinates (x,y and z), separated by a marker (an integer). Here's a snippet of it:
500
0.14166 0.09077 0
0.11918 0.08461 0
0.09838 0.07771 0
0.07937 0.07022 0
0.06223 0.06222 0
0.04705 0.05386 0
0.03388 0.04528 0
0.02281 0.03663 0
0.01391 0.02808 0
42
0.00733 0.01969 0
0.00297 0.01152 0
0.01809 -0.01422 0
0.03068 -0.01687 0
0.14166 0.09077 0
0.11918 0.08461 0
0.09838 0.07771 0
0.07937 0.07022 0
42
0.14166 0.09077 0
0.11918 0.08461 0
0.09838 0.07771 0
0.07937 0.07022 0
What's the best way to separate it in chunks (preferably, one array per interval between markers)?
It's just a fraction of the data, in reality there are a few thousand points.
I would suggest to apply the power of pandas
and numpy
libraries.
We start with loading the input file into dataframe with skipping the 1st row (skiprows=1
) and explicitly specifying the number of columns via column names (names=['x','y','z']
) meaning that marker lines will be treated as 1-column row with NaN
values (like 42.00000 NaN NaN
):
import pandas as pd
import numpy as np
coords = pd.read_table('test.dat', delim_whitespace=True, header=None,
engine='python', skiprows=1, names=['x','y','z'])
Then finding the positions of marker lines on which the coords
dataframe will be splitted into chunks:
na_markers = coords.loc[coords['y'].isna()].index
Finally splitting and getting the needed numpy arrays:
coords = [chunk.dropna().to_numpy() for chunk in np.split(coords, na_markers)]
That's it, now coords
contains a list of the needed coordinates "chunks":
[array([[0.14166, 0.09077, 0. ],
[0.11918, 0.08461, 0. ],
[0.09838, 0.07771, 0. ],
[0.07937, 0.07022, 0. ],
[0.06223, 0.06222, 0. ],
[0.04705, 0.05386, 0. ],
[0.03388, 0.04528, 0. ],
[0.02281, 0.03663, 0. ],
[0.01391, 0.02808, 0. ]]), array([[ 0.00733, 0.01969, 0. ],
[ 0.00297, 0.01152, 0. ],
[ 0.01809, -0.01422, 0. ],
[ 0.03068, -0.01687, 0. ],
[ 0.14166, 0.09077, 0. ],
[ 0.11918, 0.08461, 0. ],
[ 0.09838, 0.07771, 0. ],
[ 0.07937, 0.07022, 0. ]]), array([[0.14166, 0.09077, 0. ],
[0.11918, 0.08461, 0. ],
[0.09838, 0.07771, 0. ],
[0.07937, 0.07022, 0. ]])]