Search code examples
arraysjuliatiming

Splitting a large matrix


I'm using Julia to ingest a large two dimensional data array (data) of size 1000 x 32768; I need to break up the array into smaller square arrays along both dimensions. For instance, I would like to break data into a grid of smaller, square arrays similar to the following image:

enter image description here

Note that no pixels get left out -- when another square cannot be fit in along either axis, the last possible array of square pixels is returned as another array (hence the shifted pink squares on the right hand side).

Currently, I'm doing this through a function I built to decimate the raw dataset:

function decimate_square(data,fraction=4)

    # Read size of input data / calculate length of square side
    sy,sx = size(data)
    square_side = Int(round(sy/fraction))

    # Number of achievable full squares
    itersx,itersy = [Int(floor(s/square_side)) for s in [sx,sy]]

    # Find left/right X values
    for ix in 1:itersx
        if ix!=itersx
            # Full sliding square can be calculated
            left = square_side*(ix-1) + 1
            right = square_side*(ix)
        else
            # Capture last square of data
            left = sx-square_side + 1
            right = sx
        end

        # Find top/bottom Y values for each X
        for iy in 1:itersy
            if iy!=itersy
                # Full sliding square can be calculated
                top = square_side*(iy-1) + 1
                bottom = square_side*(iy)
            else
                # Capture last square of data
                top = sy-square_side + 1
                bottom = sy
            end

            # Record data in 3d stack
            cursquare = data[top:bottom,left:right]
            if (ix==1)&&(iy==1); global dstack=cursquare
            else; dstack=cat(dstack,cursquare,dims=3)
            end
        end
    end
    return dstack
end

Which currently takes ~20 seconds to run:

rand_arr = rand(1000,32768)
t1 = Dates.now()
dec_arr = decimate_square(rand_arr)
t2 = Dates.now()
@info(t2-t1)

[ Info: 19666 milliseconds

This is the biggest bottleneck of my analysis. Is there a pre-built function that I can use, or is there a more efficient way to decimate my array?


Solution

  • You can take views as Przemyslaw Szufel suggests, and the CartesianIndex type comes in handy for selecting blocks of the matrix.

    julia> function squareviews(data, fraction = 4)
              squareside = floor(Int, size(data, 1) / fraction)
              
              [@view(M[CartesianIndex(ix-squareside+1, iy-squareside+1):CartesianIndex(ix, iy)])
                 for ix in squareside:squareside:size(data, 1),
                     iy in squareside:squareside:size(data, 2)]
           end
    squareviews (generic function with 2 methods)
    
    julia> result = squareviews(M)
    4×40 Matrix{SubArray{Int64, 2, Matrix{Int64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}}:
     [346 392 … 746 429; 380 193 … 476 757; … ; 424 329 … 285 427; 591 792 … 710 891]   …  [758 916 … 7 185; 26 846 … 631 808; … ; 945 713 … 875 137; 793 655 … 400 322]
     [55 919 … 402 728; 292 238 … 266 636; … ; 62 490 … 913 126; 293 475 … 492 20]         [53 8 … 146 365; 216 673 … 157 909; … ; 955 635 … 332 945; 354 913 … 922 272]
     [278 966 … 128 334; 700 560 … 226 701; … ; 529 398 … 17 674; 237 830 … 4 788]         [239 274 … 983 911; 591 669 … 762 675; … ; 213 949 … 917 903; 336 890 … 633 578]
     [723 483 … 135 283; 729 579 … 1000 942; … ; 987 383 … 764 544; 682 942 … 376 179]     [370 859 … 444 566; 34 106 … 320 161; … ; 310 41 … 868 349; 719 341 … 718 800]
    
    

    This divides the data matrix into blocks such that result[2, 3] gives the square that is 2nd from the top and 3rd from the left. (My matrix M was 100x1000 in size, so there are 100/25 = 4 blocks vertically and 1000/25 = 40 blocks horizontally.)

    If you want the results linearly like in your original function, you can instead have the second line of the function be:

    julia> function squareviews(data, fraction = 4)
              squareside = floor(Int, size(data, 1) / fraction)
              
              [@view(M[CartesianIndex(ix-squareside+1, iy-squareside+1):CartesianIndex(ix, iy)])
                 for iy in squareside:squareside:size(data, 2)
                 for ix in squareside:squareside:size(data, 1)]
           end
    squareviews (generic function with 2 methods)
    
    julia> squareviews(M)
    160-element Vector{SubArray{Int64, 2, Matrix{Int64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}}:
    

    (Note the subtle changes in the for syntax - the iy comes before ix here, there's no comma, and there's an extra for.)

    This returns a vector of square matrices (views).

    Your original function returned a three-dimensional matrix, in which you'd access values as originalresult[i, j, k]. Here, the equivalent would be result[k][i, j].