sequence deep-copy nim-lang copy-assignment

Nim sequence assignment value semantics

I was under the impression that sequences and strings always get deeply copied on assignment. Today I got burned when interfacing with a C library to which I pass unsafeAddr of a Nim sequence. The C library writes into the memory area starting at the passed pointer.

Since I don't want the original Nim sequence to be changed by the library I thought I'll simply copy the sequence by assigning it to a new variable named copy and pass the address of the copy to the library.

Lo and behold, the modifications showed up in the original Nim sequence nevertheless. What's even more weird is that this behavior depends on whether the copy is declared via let copy = ... (changes do show up) or via var copy = ... (changes do not show up). The following code demonstrates this in a very simplified Nim example:

proc changeArgDespiteCopyAssignment(x: seq[int], val: int): seq[int] =
  let copy = x
  let copyPtr = unsafeAddr(copy[0])
  copyPtr[] = val
  result = copy

proc dontChangeArgWhenCopyIsDeclaredAsVar(x: seq[int], val: int): seq[int] =
  var copy = x
  let copyPtr = unsafeAddr(copy[0])
  copyPtr[] = val
  result = copy

let originalSeq = @[1, 2, 3]
var ret = changeArgDespiteCopyAssignment(originalSeq, 9999)

echo originalSeq
echo ret

ret = dontChangeArgWhenCopyIsDeclaredAsVar(originalSeq, 7777)

echo originalSeq
echo ret

This prints

@[9999, 2, 3]

@[7777, 2, 3]

So the first call changes originalSeq while the second doesn't. Can someone explain what is going on under the hood? I'm using Nim 1.6.6 and a total Nim newbie.

Solution

Turns out there are a lot of issues concerned with this behavior in the nim-lang issue tracker. For example:

let semantics gives 3 different results depends on gc, RT vs VM, backend, type, global vs local scope

Seq assignment does not perform a deep copy

Let behaves differently in proc for default gc

assigning var to local let does not properly copy in default gc

clarify spec/implementation for let: move or copy?

RFC give default gc same semantics as --gc:arc as much as possible

Long story short, whether a copy is made depends on a lot of factors, for sequences especially on the scope (global vs. local ) and the gc (refc, arc, orc) in use. More generally, the type involved (seq vs. array), the code generation backend (C vs. JS) and whatnot can also be relevant.

This behavior has tricked a lot of beginners and is not well received by some of the contributors. It doesn't happen with the newer GCs --gc:arc or --gc:orc where the latter is expected to become the default GC in an upcoming Nim version. It has never been fixed in the current default gc because of performance concerns, backward compatibility risks and the expectation that it will disappear anyway once we transition to the newer GCs.

Personally, I would have expected that it at least gets clearly singled out in the Nim language manual. Well, it isn't.