I've only just started with Nim, hence it possibly is a simple question. We need to do many lookups into data that are stored in a file. Some of these files are too large to load into memory, hence the mmapped approach. I'm able to mmap the file by means of memfiles and either have a pointer or MemSlice at my hand. The file and the memory region are read-only, and hence have a fixed size. I was hoping that I'm able to access the data as immutable fixed size byte and char arrays without copying them, leveraging all the existing functionalities available to seqs, arrays, strings etc.. All the MemSlice / string methods copy the data, which is fair, but not what I want (and in my use case don't need).
I understand array, strings etc. types have a pointer to the data and a len field. But couldn't find a way to create them with a pointer and len. I assume it has something to do with ownership and refs to mem that may outlive my slice.
let mm = memfiles.open(...)
let myImmutableFixesSizeArr = ?? # cast[ptr array[fsize, char]](mm.mem) doesn't compile as fsize needs to be const. Neither could I find something like let x: [char] = array_from(mm.mem, fsize)
let myImmutableFixedSizeString = mm[20, 30].to_fixed_size_immutable_string # Create something that is string like so that I can use all the existing string methods.
UPDATE: I did find https://forum.nim-lang.org/t/4680#29226 which explains how to use OpenArray, but OpenArray is only allowed as function argument, and you - if I'm not mistaken - it is doesn't behave like a normal array.
Thanks for your help
It is not possible to convert a raw char array in memory (ptr UncheckedArray[char]
) to a string
without copying, only to an openArray[char]
(or cstring
)
So it won't be possible to use procs that expect a string
, only those that accept openArray[T]
or openArray[char]
Happily an openArray[T]
behaves exactly like a seq[T]
when sent to a proc.
({.experimental:"views".}
does let you assign an openArray[T] to a local variable, but it's not anywhere near ready for production)
you can use the memSlices
iterator to loop over delimited chunks in a memFile without copying:
import memfiles
template toOpenArray(ms: MemSlice, T: typedesc = byte): openArray[T] =
##template because openArray isn't a valid return type yet
toOpenArray(cast[ptr UncheckedArray[T]](ms.data),0,(ms.size div sizeof(T))-1)
func process(slice:openArray[char]) =
## your code here but e.g.
## count number of A's
var nA: int
for ch in slice.items:
if ch == 'A': inc nA
debugEcho nA
let mm = memfiles.open("file.txt")
for slice in mm.memSlices:
process slice.toOpenArray(char)
Or, to work with some char array represented in the middle of the file, you can use pointer arithmetic.
import memfiles
template extractImpl(typ,pntr,offset) =
cast[typ](cast[ByteAddress](pntr)+offset)
template checkFileLen(memfile,len,offset) =
if offset + len > memfile.size:
raise newException(IndexDefect,"file too short")
func extract*(mm: MemFile,T:typedesc, offset:Natural): ptr T =
checkFileLen(mm,T,offset)
result = extractImpl(ptr T,mm.mem,offset)
func extract*[U](mm: MemFile,T: typedesc[ptr U], offset: Natural): T =
extractImpl(T,mm.mem,offset)
let mm = memfiles.open("file.txt")
#to extract a compile-time known length string:
let mystring_offset = 3
const mystring_len = 10
type MyStringT = array[mystring_len,char]
let myString:ptr MyStringT = mm.extract(MyStringT,mystring_offset)
process myString[]
#to extract a dynamic length string:
let size_offset = 14
let string_offset = 18
let sz:ptr int32 = mm.extract(int32,size_offset)
let str:ptr UncheckedArray[char] = mm.extract(ptr UncheckedArray[char], string_offset)
checkFileLen(mm,sz[],string_offset)
process str.toOpenArray(0,sz[]-1)