c++performance optimization cpu data-oriented-design

How much of an object (containing std::vectors) is loaded in to the L1/L2/L3 cache?

Please see the following link, page 22 onwards:

http://research.scee.net/files/presentations/gcapaustralia09/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf

the above link suggests if I have an object containing vectors/arrays like this:

class MyClass{
    public:
    double a[1000];
    double b[1000];
};

and the code below iterates through a vector of MyClass and performs some Math on std::vector b:

std::vector<MyClass> y;
y.populateVector();

for(auto x : y){
    //Iterate though x.b and do some math;
    for(int i=0; i<1000; i++){
        std::cout << x.b[i] << std::endl;
    }
}

when we retrieve each MyClass object all the data from both of the arrays will be loaded in to the cache line. Is this true? I didnt think the data a would be loaded in to the cache line because the address for accessing b would be calculated and loaded.

I am trying to get an idea how much of a MyClass object is loaded in to the cache in comparison to the useful data required for processing?

I can understand if the very first b element shared the same cache line as the very last a elementbut I didnt think the whole object would get loaded in to the L2/L3 cache just to process one part of the object?

Solution

Depends on the way that the memory is organized on your system. If it just so happens that the backing arrays for a and b are located very closely in memory (since the CPU will usually issue larger reads to fill cache in the hopes you use it) it's possible that they will be loaded. If not I see no reason for reading b would imply anything to do with a aside from attempting to read some pointers from where the class actually resides in memory.

What it does show is that using classes in haphazard manners can and will cause cache misses just because of the way that they reside in memory.

General rule for what gets loaded into the cache is, if the CPU issues a read and misses the cache it will load a cache aligned chunk from main memory (in the examples there 128 bytes).

For your edited example, Yes these are colocated pieces of memory and parts of a MAY be loaded if reads to b are issued just because of their location in memory.

For your example, each MyClass object consists of a contiguous region of 2000 * sizeof(double) bytes (most likely aligned). These objects are packed into a continuous memory region pointed to by the vector. Accessing the b member of each object will incur a cache miss (if it's not cached). The contents of a cache aligned piece of memory will be loaded from the each read that misses the cache. Depending on the memory alignment constraints and cache size it's possible that some of the entries from the a member will be loaded into memory. It's even possible to assume that due to padding and alignment that it will not be the case that any of your MyClass a members will be loaded into the cache (and there's no reason they should be as they were not accessed).