The following C function attempts to prevent recursion in multicore code in a thread-safe manner using a thread local storage variable. However, for reasons that are somewhat complicated, I NEED to write this function in X64 assembler (Intel X86 / AMD 64-bit) and assemble it with ml64.exe from VC2010. I know how to do this if I'm using global variables but I'm not sure how to do it properly with a TLS variable that has __declspec(thread).
__declspec(thread) int tls_VAR = 0;
void norecurse( )
{
if(0==tls_VAR)
{
tls_VAR=1;
DoWork();
tls_VAR=0;
}
}
Note: This is what VC2010 kicks out for the function. However, MASM (ml64.exe) doesn't support the gs:88
or OFFSET FLAT:
parts of the code.
; Listing generated by Microsoft (R) Optimizing Compiler Version 16.00.40219.01
include listing.inc
INCLUDELIB MSVCRTD
INCLUDELIB OLDNAMES
PUBLIC norecurse
EXTRN DoWork:PROC
EXTRN tls_VAR:DWORD
EXTRN _tls_index:DWORD
pdata SEGMENT
$pdata$norecurse DD imagerel $LN4
DD imagerel $LN4+70
DD imagerel $unwind$norecurse
pdata ENDS
xdata SEGMENT
$unwind$norecurse DD 040a01H
DD 06340aH
DD 07006320aH
; Function compile flags: /Ogtpy
xdata ENDS
_TEXT SEGMENT
norecurse PROC
; File p:\hackytests\64bittest2010\64bittest\64bittest.cpp
; Line 19
$LN4:
mov QWORD PTR [rsp+8], rbx
push rdi
sub rsp, 32 ; 00000020H
; Line 20
mov ecx, DWORD PTR _tls_index
mov rax, QWORD PTR gs:88
mov edi, OFFSET FLAT:tls_VAR
mov rbx, QWORD PTR [rax+rcx*8]
cmp DWORD PTR [rbx+rdi], 0
jne SHORT $LN1@norecurse
; Line 22
mov DWORD PTR [rbx+rdi], 1
; Line 23
call DoWork
; Line 24
mov DWORD PTR [rbx+rdi], 0
$LN1@norecurse:
; Line 26
mov rbx, QWORD PTR [rsp+48]
add rsp, 32 ; 00000020H
pop rdi
ret 0
norecurse ENDP
_TEXT ENDS
END
As your answer indicates the problem comes down finding the MASM equivalents to the following two lines in assembly listing generated by the Microsoft's C++ compiler:
mov rax, QWORD PTR gs:88
mov edi, OFFSET FLAT:tls_VAR
The first line is easy. Just replace gs:88
with gs:[88]
.
The second line is less obvious. The OFFSET FLAT:
operator is a red herring. It means use the offset relative to the beginning of the "FLAT" segment. With the 32-bit version of MASM, the FLAT segment is the segment that includes the entire 4G address space. This is the segment that's used for both the code and data segment as part of the 32-bit flat memory model. The 64-bit version of MASM doesn't support memory models, it essentially always assumes a 64-bit version of the flat memory model, so it doesn't support the FLAT keyword. As result the plain OFFSET
operator ends meaning the same thing. (In fact with the 32-bit assembler, plain OFFSET
also normally means the same thing because PECOFF only supports the flat memory model.)
However using OFFSET
here won't work. That's because it would use the offset of the address of tls_VAR
in memory relative to address 0. Or in other words, it would use the absolute address of tls_VAR
in memory. What's needed here is the offset relative to the beginning of the TLS data section.
So the compiler must be doing something special here. In order find out, I dumped the relocations in the object file generated while compiling your example C code:
> dumpbin /relocations t215a.obj
...
RELOCATIONS #4
Symbol Symbol
Offset Type Applied To Index Name
-------- ---------------- ----------------- -------- ------
00000008 REL32 00000000 14 _tls_index
00000016 SECREL 00000000 8 tls_VAR
0000002D REL32 00000000 C DoWork
...
As you can see it generates a relocation of type SECREL
for the reference to tls_VAR
. This makes the relocation relative to the base of the section in the generated executable that that symbol appears in. In this case that's the .tls
section, so this relocation generates an offset relative to the beginning of the section used for static TLS data.
So now the question becomes how to get MASM to generate the same SECREL relocation the compiler emits. This turns out to have a easy solution as well, just replace OFFSET FLAT:
with SECTIONREL
.
So with these changes (and a bit of optimization) your function becomes:
EXTERN tls_VAR:DWORD
EXTERN _tls_index:DWORD
EXTERN DoWork:PROC
PUBLIC norecurse
_TEXT SEGMENT
norecurse PROC
push rbx
sub rsp, 32
mov rax, gs:[88]
mov ecx, _tls_index
mov rbx, [rax + rcx * 8]
cmp DWORD PTR [rbx + SECTIONREL tls_VAR], 0
jne return
mov DWORD PTR [rbx + SECTIONREL tls_VAR], 1
call DoWork
mov DWORD PTR [rbx + SECTIONREL tls_VAR], 0
return:
add rsp, 32
pop rbx
ret
norecurse ENDP
_TEXT ENDS
END