I have a Julia code, version 1.2, which performs a lot of operations on a 10000 x 10000 Array
. Due to OutOfMemory()
error when I run the code, I’m exploring other options to run it, such as Memory-mapping. Concerning the use of Mmap.mmap
, I’m a bit confused with the use of the Array that I map to my disk, due to little explanations on https://docs.julialang.org/en/v1/stdlib/Mmap/index.html. Here is the beginning of my code:
using Distances
using LinearAlgebra
using Distributions
using Mmap
data=Float32.(rand(10000,15))
Eucldist=pairwise(Euclidean(),data,dims=1)
D=maximum(Eucldist.^2)
sigma2hat=mean(((Eucldist.^2)./D)[tril!(trues(size((Eucldist.^2)./D)),-1)])
L=exp.(-(Eucldist.^2/D)/(2*sigma2hat))
L
is the 10000 x 10000 Array
with which I want to work, so I mapped it to my disk with
s = open("mmap.bin", "w+")
write(s, size(L,1))
write(s, size(L,2))
write(s, L)
close(s)
What am I supposed to do after that? The next step is to perform K=eigen(L)
and apply other commands to K
. How should I do that? With K=eigen(L)
or K=eigen(s)
? What’s the role of the object s
and when does it get involved? Moreover, I don’t understand why I have to use Mmap.sync!
and when. After each subsequent lines after eigen(L)
? At the end of the code? How can I be sure that I’m using my disk space instead of RAM memory?Would like some highlights about memory-mapping, please. Thank you!
If memory usage is a concern, it is often best to re-assign your very large arrays to 0, or to a similar type-safe small matrix, so that the memory can be garbage collected, assuming you are done with those intermediate matrices. After that, you just call Mmap.mmap() on your stored data file, with the type and dimensions of the data as second and third arguments to mmap, and then assign the function's return value to your variable, in this case L, resulting in L being bound to the file contents:
using Distances
using LinearAlgebra
using Distributions
using Mmap
function testmmap()
data = Float32.(rand(10000, 15))
Eucldist = pairwise(Euclidean(), data, dims=1)
D = maximum(Eucldist.^2)
sigma2hat = mean(((Eucldist.^2) ./ D)[tril!(trues(size((Eucldist.^2) ./ D)), -1)])
L = exp.(-(Eucldist.^2 / D) / (2 * sigma2hat))
s = open("./tmp/mmap.bin", "w+")
write(s, size(L,1))
write(s, size(L,2))
write(s, L)
close(s)
# deref and gc collect
Eucldist = data = L = zeros(Float32, 2, 2)
GC.gc()
s = open("./tmp/mmap.bin", "r+") # allow read and write
m = read(s, Int)
n = read(s, Int)
L = Mmap.mmap(s, Matrix{Float32}, (m, n)) # now L references the file contents
K = eigen(L)
K
end
testmmap()
@time testmmap() # 109.657995 seconds (17.48 k allocations: 4.673 GiB, 0.73% gc time)