I'm using Julia to ingest a large two dimensional data array (data
) of size 1000 x 32768; I need to break up the array into smaller square arrays along both dimensions. For instance, I would like to break data
into a grid of smaller, square arrays similar to the following image:
Note that no pixels get left out -- when another square cannot be fit in along either axis, the last possible array of square pixels is returned as another array (hence the shifted pink squares on the right hand side).
Currently, I'm doing this through a function I built to decimate the raw dataset:
function decimate_square(data,fraction=4)
# Read size of input data / calculate length of square side
sy,sx = size(data)
square_side = Int(round(sy/fraction))
# Number of achievable full squares
itersx,itersy = [Int(floor(s/square_side)) for s in [sx,sy]]
# Find left/right X values
for ix in 1:itersx
if ix!=itersx
# Full sliding square can be calculated
left = square_side*(ix-1) + 1
right = square_side*(ix)
else
# Capture last square of data
left = sx-square_side + 1
right = sx
end
# Find top/bottom Y values for each X
for iy in 1:itersy
if iy!=itersy
# Full sliding square can be calculated
top = square_side*(iy-1) + 1
bottom = square_side*(iy)
else
# Capture last square of data
top = sy-square_side + 1
bottom = sy
end
# Record data in 3d stack
cursquare = data[top:bottom,left:right]
if (ix==1)&&(iy==1); global dstack=cursquare
else; dstack=cat(dstack,cursquare,dims=3)
end
end
end
return dstack
end
Which currently takes ~20 seconds to run:
rand_arr = rand(1000,32768)
t1 = Dates.now()
dec_arr = decimate_square(rand_arr)
t2 = Dates.now()
@info(t2-t1)
[ Info: 19666 milliseconds
This is the biggest bottleneck of my analysis. Is there a pre-built function that I can use, or is there a more efficient way to decimate my array?
You can take view
s as Przemyslaw Szufel suggests, and the CartesianIndex
type comes in handy for selecting blocks of the matrix.
julia> function squareviews(data, fraction = 4)
squareside = floor(Int, size(data, 1) / fraction)
[@view(M[CartesianIndex(ix-squareside+1, iy-squareside+1):CartesianIndex(ix, iy)])
for ix in squareside:squareside:size(data, 1),
iy in squareside:squareside:size(data, 2)]
end
squareviews (generic function with 2 methods)
julia> result = squareviews(M)
4×40 Matrix{SubArray{Int64, 2, Matrix{Int64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}}:
[346 392 … 746 429; 380 193 … 476 757; … ; 424 329 … 285 427; 591 792 … 710 891] … [758 916 … 7 185; 26 846 … 631 808; … ; 945 713 … 875 137; 793 655 … 400 322]
[55 919 … 402 728; 292 238 … 266 636; … ; 62 490 … 913 126; 293 475 … 492 20] [53 8 … 146 365; 216 673 … 157 909; … ; 955 635 … 332 945; 354 913 … 922 272]
[278 966 … 128 334; 700 560 … 226 701; … ; 529 398 … 17 674; 237 830 … 4 788] [239 274 … 983 911; 591 669 … 762 675; … ; 213 949 … 917 903; 336 890 … 633 578]
[723 483 … 135 283; 729 579 … 1000 942; … ; 987 383 … 764 544; 682 942 … 376 179] [370 859 … 444 566; 34 106 … 320 161; … ; 310 41 … 868 349; 719 341 … 718 800]
This divides the data matrix into blocks such that result[2, 3]
gives the square that is 2nd from the top and 3rd from the left. (My matrix M
was 100x1000 in size, so there are 100/25 = 4 blocks vertically and 1000/25 = 40 blocks horizontally.)
If you want the results linearly like in your original function, you can instead have the second line of the function be:
julia> function squareviews(data, fraction = 4)
squareside = floor(Int, size(data, 1) / fraction)
[@view(M[CartesianIndex(ix-squareside+1, iy-squareside+1):CartesianIndex(ix, iy)])
for iy in squareside:squareside:size(data, 2)
for ix in squareside:squareside:size(data, 1)]
end
squareviews (generic function with 2 methods)
julia> squareviews(M)
160-element Vector{SubArray{Int64, 2, Matrix{Int64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}}:
(Note the subtle changes in the for
syntax - the iy
comes before ix
here, there's no comma, and there's an extra for
.)
This returns a vector of square matrices (views).
Your original function returned a three-dimensional matrix, in which you'd access values as originalresult[i, j, k]
. Here, the equivalent would be result[k][i, j]
.