assembly x86-64 cpu-architecture micro-optimization

Assembly function's data arrangment in data section

I know that it's possible to keep a function's data near to it (for example, at its end) or just far from the function, in data section. Also, I know it's better to keep the data in data section (data like jmp-tables and ...) and just let's consider that we are keeping function's data in data section. Now, my question about how to arrange data (based on their size) in data section. For example, a function has a jmp-table (list of 8-byte addresses) and a lot of DWORD (4-BYTE) data and some WORD (2-BYTE) data with a lot of 1-BYTE data:

section code
func:
   ...
section data align 64
 func.jmp_table:
    DQ ...
    DQ ...
    DQ ...
    DQ ...
    DQ ...
    DQ ...
 func.data4:
    DD 0,1,2,3,4,...
 func.data2:
    DW 0,1,2,3,4,...
 func.data1:
    DB 0,1,2,3,4,...

So we put a function data in data section. But let's think we have 10 functions and each function has its own multi size data (QWORD,DWORD,WORD,BYTE, ...). Now my question is about how to put these data into data section. Which way is better? Putting each functions data near each other (QWORD,DWORD,WORD,BYTE) or just divide the data section to QWORDs, DWORDs, WORDs, BYTEs and arrange data based on their size ?

Way1 (putting each function's data back to back and on top of it, let's do 8-byte alignment):

section code
 func:
   ...
 func2:
   ...
 func3:
   ...
section data align 64
 func.jmp_table:
    DQ ...,...,...,...,...,...
 func.data4:
    DD ...,...,...,...,...,...
 func.data2:
    DW ...,...,...,...,...,...
 func.data1:
    DB ...,...,...,...,...,...

 align 8
 func2.jmp_table:
    DQ ...,...,...,...,...,...
 func2.data4:
    DD ...,...,...,...,...,...
 func2.data2:
    DW ...,...,...,...,...,...
 func2.data1:
    DB ...,...,...,...,...,...

 align 8
 func3.data1:
    DB ...,...,...,...,...,...
 func3.jmp_table:
    DQ ...,...,...,...,...,...
 func3.data4:
    DD ...,...,...,...,...,...
 func3.data2:
    DW ...,...,...,...,...,...
 func3.data1:
    DB ...,...,...,...,...,...

Way two (split each function's data based on its size and arrange data section based on size).

section code
     func:
       ...
     func2:
       ...
     func3:
       ...
    section data align 64
     func.jmp_table:
        DQ ...,...,...,...,...,...
     func2.jmp_table:
        DQ ...,...,...,...,...,...
     func3.jmp_table:
        DQ ...,...,...,...,...,...

     func.data4:
        DD ...,...,...,...,...,...
     func2.data4:
        DD ...,...,...,...,...,...
     func3.data4:
        DD ...,...,...,...,...,...

     func.data2:
        DW ...,...,...,...,...,...
     func2.data2:
        DW ...,...,...,...,...,...
     func3.data2:
        DW ...,...,...,...,...,...

     func.data1:
        DB ...,...,...,...,...,...
     func2.data1:
        DB ...,...,...,...,...,...
     func3.data1:
        DB ...,...,...,...,...,...

Solution

Your decision should be based upon cache utilization.

In other words, data that are often accessed together should be placed as close as possible, so as to maximize the chances of falling in the same "cache line".

("Cache line" is a historical term which you can look up, it basically means "cache page", but the word page is already used for something else.)