How to avoid cached data in multithread application C++

Can someone clarify the next moment. I saw some implementations of std::queue for multithreading purposes, where all operations of pushing/poping/erasing elements where protected with mutex, but when I see that I imagine next scenario: we have two threads (thread1 and thread2) and they are running on different cores of the processor, thus they have different L1 caches. And we have the next queue:

struct Message {
    char* buf;
    size_t size;
};
std::queue<Message> messageQueue;

Thread1 adds some element to queue, then thread2 tries to access an element with front() method, but what if that piece of memory was previosly cached for this core of the processor (so the size variable may not indicate current size of buf variable, or buf pointer may hold wrong (not updated) address)? I have such problem while designing client/server application on the server side. In my app server is running in one thread, it works directly with sockets, and when it receives new message, it allocates memory for that message and adds this message to some message queue, then other thread accesses this queue, processes message and then deletes it. I am always afraid of caching problems, and because of that I have created my own implementation of queue with volatile pointers. What is the proper way to work with such things? Do I have to avoid using std::list, std::queue with locks? If this problem is impossible, could you please explain why?

Solution

It's not your problem. You're writing C++ code. It's the compiler's job to ensure your code makes the CPU and its caches do the right thing, not yours. You just have to comply with the rules for whatever threading standard you are using.

But if i lock some mutex, how can i be shure, that this memory isn't cached, or mutex locking somehow guarantee reading directly from the memory?

You can't. And that's a good thing. Caching massively improves performance and main memory is terribly slow. Fortunately, no modern CPU that you're likely to write multi-threaded code on requires you to sacrifice performance like that. They have incredibly sophisticated optimizations such as cache coherency hardware and prefetch pinning to avoid things that hurt performance that much. What you want is for your code to work, not for it to work awfully.