Search code examples
out-of-memoryjuliamemory-mapping

Use of Memory-mapped in Julia


I have a Julia code, version 1.2, which performs a lot of operations on a 10000 x 10000 Array . Due to OutOfMemory() error when I run the code, I’m exploring other options to run it, such as Memory-mapping. Concerning the use of Mmap.mmap, I’m a bit confused with the use of the Array that I map to my disk, due to little explanations on https://docs.julialang.org/en/v1/stdlib/Mmap/index.html. Here is the beginning of my code:

using Distances
using LinearAlgebra
using Distributions
using Mmap
data=Float32.(rand(10000,15))
Eucldist=pairwise(Euclidean(),data,dims=1)
D=maximum(Eucldist.^2)
sigma2hat=mean(((Eucldist.^2)./D)[tril!(trues(size((Eucldist.^2)./D)),-1)])
L=exp.(-(Eucldist.^2/D)/(2*sigma2hat))

L is the 10000 x 10000 Array with which I want to work, so I mapped it to my disk with

s = open("mmap.bin", "w+")
write(s, size(L,1))
write(s, size(L,2))
write(s, L)
close(s)

What am I supposed to do after that? The next step is to perform K=eigen(L) and apply other commands to K. How should I do that? With K=eigen(L) or K=eigen(s)? What’s the role of the object s and when does it get involved? Moreover, I don’t understand why I have to use Mmap.sync! and when. After each subsequent lines after eigen(L)? At the end of the code? How can I be sure that I’m using my disk space instead of RAM memory?Would like some highlights about memory-mapping, please. Thank you!


Solution

  • If memory usage is a concern, it is often best to re-assign your very large arrays to 0, or to a similar type-safe small matrix, so that the memory can be garbage collected, assuming you are done with those intermediate matrices. After that, you just call Mmap.mmap() on your stored data file, with the type and dimensions of the data as second and third arguments to mmap, and then assign the function's return value to your variable, in this case L, resulting in L being bound to the file contents:

    using Distances
    using LinearAlgebra
    using Distributions
    using Mmap
    
    function testmmap()
        data = Float32.(rand(10000, 15))
        Eucldist = pairwise(Euclidean(), data, dims=1)
        D = maximum(Eucldist.^2)
        sigma2hat = mean(((Eucldist.^2) ./ D)[tril!(trues(size((Eucldist.^2) ./ D)), -1)])
        L = exp.(-(Eucldist.^2 / D) / (2 * sigma2hat))
        s = open("./tmp/mmap.bin", "w+")
        write(s, size(L,1))
        write(s, size(L,2))
        write(s, L)
        close(s)
    
        # deref and gc collect
        Eucldist = data = L = zeros(Float32, 2, 2)
        GC.gc()
    
        s = open("./tmp/mmap.bin", "r+") # allow read and write
        m = read(s, Int)
        n = read(s, Int)
        L = Mmap.mmap(s, Matrix{Float32}, (m, n))  # now L references the file contents
        K = eigen(L)
        K
    end
    
    testmmap()
    @time testmmap()  # 109.657995 seconds (17.48 k allocations: 4.673 GiB, 0.73% gc time)