I have text files from an external source that are formatted like so:
0 0 -0.105961 0.00000
1 0 -1.06965 0.00000
1 1 -0.0187213 -0.240237
2 0 -0.124695 0.00000
2 1 -0.178982 0.0633255
2 2 0.760988 -0.213796
3 0 -1.96695 0.00000
3 1 0.0721285 0.0491248
3 2 -0.560517 0.267733
3 3 -0.188732 -0.112053
4 0 -0.0205364 0.00000
⋮ ⋮ ⋮ ⋮
40 30 0.226833 -0.733674
40 31 0.0444837 -0.249677
40 32 -0.171559 -0.970601
40 33 -0.141848 -0.137257
40 34 -0.247042 -0.902128
40 35 -0.495114 0.322912
40 36 0.132215 0.0543294
40 37 0.125682 0.817945
40 38 0.181098 0.223309
40 39 0.702915 0.103991
40 40 1.11882 -0.488252
where the first two columns are the indices of a 2d array (say i
and j
), and the 3rd and 4th columns are the values for two 2d arrays (say p[:,:]
and q[:,:]
). What would be the idiomatic way in Python/Julia to read this into two 2d arrays?
There are some assumptions we can make: the arrays are lower triangular (i.e., the values only exist (or are nonzero) for j <= i
), and the indices are increasing, that is, the last line can be expected to have the largest i
(and probably j
).
The current implementation assumes that maximum i
and j
is 40, and proceeds like so: (Julia minimal working example):
using OffsetArrays
using DelimitedFiles: readdlm
n = 40
p = zeros(0:n, 0:n)
q = zeros(0:n, 0:n)
open(filename) do infile
for i = 0:n
for j = 0:i
line = readline(infile)
arr = readdlm(IOBuffer(line))
p[i,j] = arr[3]
q[i,j] = arr[4]
end
end
end
Note that this example also assumes that the index j
changes the fastest, which is usually true, but in doing that it effectively ignores the first two columns of the data.
However, I'm looking for a solution that makes fewer assumptions about the data file (maybe only the lower-triangular one and the index increasing one). What would be a "natural" way of doing this in Julia or Python?
The code in the question is almost there. Here is a Julian way to do this:
using DelimitedFiles, OffsetArrays
rawdata = readdlm("in.txt");
irange = (:)(Int.(extrema(rawdata[:,1]))...)
jrange = (:)(Int.(extrema(rawdata[:,2]))...)
p = OffsetArray(zeros(length.((irange,jrange))),irange,jrange)
q = copy(p)
foreach(eachrow(rawdata)) do (i,j,pd,qd)
p[Int(i),Int(j)] = pd
q[Int(i),Int(j)] = qd
end
This isn't the simplest/prettiest code, but data parsing rarely is.
After the foldl
, the matrices p
and q
should hold the correct values (and can be cast to a TriangularMatrix type if desired).
The initial bit with irange
and jrange
can be even shortened:
rngs = Tuple(splat(:).(extrema(Int.(rawdata[:,1:2]);dims=1)))
p = OffsetArray(zeros(length.(rngs)),rngs...)
q = copy(p)